Read OSS

XState-Driven Development: How `gatsby develop` Orchestrates Reactivity

Advanced

Prerequisites

  • Article 1: Architecture and Monorepo Overview
  • Article 2: Build Pipeline and Bootstrap
  • XState fundamentals (states, transitions, services, child machines)
  • Understanding of Node.js child processes and IPC

XState-Driven Development: How gatsby develop Orchestrates Reactivity

If the build pipeline is a factory assembly line, the develop server is an air traffic control tower. Files change, webhooks arrive, GraphQL mutations fire, and the development server must handle all of these events—sometimes simultaneously, sometimes while a previous rebuild is still running. This is the exact problem space where state machines shine.

Gatsby's gatsby develop command is orchestrated by an XState hierarchical state machine. It's one of the most sophisticated uses of XState in a production open-source project, and understanding it reveals why "just rerun the pipeline on file change" isn't sufficient for a good developer experience.

Why a State Machine?

Consider the events a development server must handle:

  • A source file changes while queries are running
  • A webhook arrives during schema customization
  • A createNode mutation fires while pages are being recreated
  • A plugin throws during sourceNodes but the server should keep running
  • File changes arrive in rapid succession (editor auto-save)

A naive approach—restart the pipeline on every event—would be catastrophically slow and potentially cause infinite loops (a source plugin that creates a file, which triggers a rebuild, which runs the source plugin again...).

The state machine gives Gatsby three superpowers:

  1. Context-dependent event handling: The same event (e.g., ADD_NODE_MUTATION) triggers different behavior depending on the current state
  2. Event batching: Multiple file changes during a query run are accumulated and processed once
  3. Infinite loop detection: A hard limit prevents runaway rebuild cycles

The Parent-Child Process Split

Before the state machine even starts, Gatsby establishes a process isolation boundary. The develop.ts command handler runs in the parent process, while the actual development server runs in a child process.

sequenceDiagram
    participant Parent as develop.ts (Parent)
    participant Child as develop-process.ts (Child)

    Parent->>Parent: Detect port, resolve SSL
    Parent->>Child: new ControllableScript(...)
    Parent->>Child: start()
    loop Every 1 second
        Child->>Parent: { type: "HEARTBEAT" }
    end
    Child->>Parent: IPC messages (forwarded)
    Parent->>Child: IPC messages (forwarded)
    Note over Child: XState machine runs here

    Child--xParent: Process crashes
    Parent->>Parent: Detect missing heartbeat

The ControllableScript class (defined at lines 49–154 of develop.ts) wraps execa.node with lifecycle management. It writes the child's bootstrap script to a temp file in .cache/, spawns it with IPC enabled (stdio: ['inherit', 'inherit', 'inherit', 'ipc']), and provides start, stop, send, onMessage, and onExit methods.

The heartbeat mechanism at lines 38–45 of develop-process.ts is delightfully pragmatic:

if (process.send) {
  setInterval(() => {
    process.send!({ type: `HEARTBEAT` })
  }, 1000)
}

The comment above it tells the story: "When the parent process is killed by SIGKILL, Node doesn't kill spawned child processes." The heartbeat crashes with ERR_IPC_CHANNEL_CLOSED when the parent dies, killing the orphaned child as a side effect.

Tip: Process isolation also provides memory isolation. If a source plugin leaks memory, only the child process is affected. The parent can restart it cleanly.

The developMachine: States and Transitions

The top-level state machine is defined in packages/gatsby/src/state-machines/develop/index.ts. Let's trace through its states:

stateDiagram-v2
    [*] --> initializing
    initializing --> initializingData: DONE
    initializingData --> runningPostBootstrap: DONE
    runningPostBootstrap --> runningQueries
    runningQueries --> startingDevServers: first run, no compiler
    runningQueries --> recompiling: source files dirty
    runningQueries --> recreatingPages: nodes mutated
    runningQueries --> waiting: clean
    startingDevServers --> waiting
    startingDevServers --> initialGraphQLTypegen: typegen enabled
    initialGraphQLTypegen --> waiting
    recompiling --> waiting

    waiting --> runningQueries: EXTRACT_QUERIES_NOW
    waiting --> recreatingPages: mutations flushed

    recreatingPages --> runningQueries: DONE
    reloadingData --> runningQueries: DONE

    state "Global Events" as ge
    note right of ge
        WEBHOOK_RECEIVED → reloadingData
        ADD_NODE_MUTATION → batched
        SOURCE_FILE_CHANGED → marked dirty
    end note

Global Event Handlers

At the top level of the machine config (lines 29–57), three global event handlers are defined:

  • ADD_NODE_MUTATION: Queues the mutation via addNodeMutation action
  • SOURCE_FILE_CHANGED: Marks source files as dirty via markSourceFilesDirty
  • WEBHOOK_RECEIVED: Immediately transitions to reloadingData state

These global handlers can be overridden by individual states. For example, the initializing state explicitly sets all three to undefined (lines 62–66):

initializing: {
  on: {
    ADD_NODE_MUTATION: undefined,
    SOURCE_FILE_CHANGED: undefined,
    WEBHOOK_RECEIVED: undefined,
  },
  // ...
}

This makes perfect sense: during initial bootstrap, there's no point handling mutations because the full pipeline will run anyway.

The Waiting State

The waiting state (lines 266–313) is the idle state where the dev server is ready. It invokes a child machine (waitForMutations) that batches incoming node mutations. When enough mutations accumulate (or source files change), the child machine completes and the parent transitions to recreatingPages.

The always guard at lines 267–273 provides a fast path: if queries were requested while transitioning to waiting, skip the wait and go directly to runningQueries:

waiting: {
  always: [
    {
      target: `runningQueries`,
      cond: ({ pendingQueryRuns }) =>
        !!pendingQueryRuns && pendingQueryRuns.size > 0,
    },
  ],
  // ...
}

Child Machines: Data Layer and Query Running

The develop machine delegates complex workflows to child machines, invoked as XState services. There are two primary child machine families.

Data Layer Machines

The data layer module (packages/gatsby/src/state-machines/data-layer/index.ts) defines three machines from composable state fragments:

Machine When Used States
initializeDataMachine First boot customizingSchema → sourcingNodes → buildingSchema → creatingPages → writingOutRedirects → done
reloadDataMachine Webhook received customizingSchema → sourcingNodes → buildingSchema → creatingPages → done
recreatePagesMachine Node mutation outside sourceNodes buildingSchema → creatingPages → done

The composability is elegant—states are defined as fragments (loadDataStates, initialCreatePagesStates, recreatePagesStates, doneState) and mixed together:

export const initializeDataMachine = createMachine({
  initial: `customizingSchema`,
  states: {
    ...loadDataStates,
    ...initialCreatePagesStates,
    ...doneState,
  },
}, options)

This means recreatePagesMachine skips the expensive customizingSchema and sourcingNodes steps entirely—it only rebuilds the schema and re-creates pages, which is correct when a node mutation happens outside of sourceNodes.

flowchart TD
    subgraph "initializeDataMachine"
        I1[customizingSchema] --> I2[sourcingNodes]
        I2 --> I3[buildingSchema]
        I3 --> I4[creatingPages]
        I4 --> I5[writingOutRedirects]
        I5 --> I6[done]
    end

    subgraph "recreatePagesMachine"
        R1[buildingSchema] --> R2[creatingPages]
        R2 --> R3[done]
    end

Query Running Machine

The query running machine (packages/gatsby/src/state-machines/query-running/index.ts) handles the full query lifecycle:

  1. extractingQueries → Extract queries from component files
  2. waitingPendingQueries → 50ms delay (see below)
  3. writingRequires → Write async-requires files
  4. calculatingDirtyQueries → Diff against previous run
  5. runningStaticQueries → Execute useStaticQuery queries
  6. runningPageQueries → Execute page queries
  7. runningSliceQueries → Execute slice queries
  8. waitingForJobs → Wait for async jobs (e.g., image processing)
  9. done

The waitingPendingQueries state at lines 46–54 deserves attention. It introduces a 50ms delay with a PAGE_QUERY_ENQUEUING_TIMEOUT because extracted queries are enqueued via setTimeout(x, 0) in a Redux middleware—meaning they haven't landed in the store yet when extraction "finishes." The comment at line 35 calls this out as a known issue: "FIXME: this has to be fixed properly."

Event Handling and Infinite Loop Protection

The most sophisticated part of the develop machine is its runningQueries state's exit conditions (lines 166–213). When query running completes, the machine evaluates a cascade of guards:

flowchart TD
    A[Queries Done] --> B{Nodes mutated during queries?}
    B -->|No| C{First run? No compiler?}
    B -->|Yes| D{Recompile count >= 6?}

    D -->|Yes| E["PANIC: Infinite loop detected"]
    D -->|No| F["recreatingPages<br/>(increment count)"]

    C -->|Yes| G[startingDevServers]
    C -->|No| H{Source files dirty?}

    H -->|Yes| I[recompiling]
    H -->|No| J[waiting]

The RECOMPILE_PANIC_LIMIT constant at line 15 is set to 6:

const RECOMPILE_PANIC_LIMIT = 6

If nodes are mutated during query running more than 6 times consecutively, the machine transitions to waiting with a panicBecauseOfInfiniteLoop action. This protects against pathological cases where a query resolver creates a node (which triggers a rebuild, which runs the query again...).

The counter is incremented by incrementRecompileCount when nodes were mutated during query running (line 185), and reset by resetRecompileCount when entering the waiting state (line 274). This means a successful cycle through queries → waiting → queries resets the counter—only consecutive mutation-during-query cycles count toward the limit.

Tip: If you see RECOMPILE_PANIC_LIMIT errors during plugin development, it usually means your onCreateNode handler is creating or modifying nodes in a way that triggers itself. The fix is to add a guard checking the node type before creating new nodes.

Dev Server Startup

When the state machine reaches startingDevServers for the first time, it invokes the startWebpackServer service. This function (in packages/gatsby/src/utils/start-server.ts) wires together:

  • Express as the HTTP server
  • webpack-dev-middleware for serving JS bundles with HMR
  • webpack-hot-middleware for pushing updates to the browser
  • WebSocket for GraphQL query result updates
  • GraphiQL Explorer for the GraphQL IDE at /__graphql
  • CORS middleware for cross-origin requests

The develop webpack stage (which we discussed in Part 2) produces the bundle served by webpack-dev-middleware. When a source file changes, the webpack compiler recompiles and hot-reloads the changed modules.

On exit from startingDevServers, three actions fire: assignServers (saves compiler and listener references to context), spawnWebpackListener (sets up file watching), and markSourceFilesClean (resets the dirty flag).

The Bigger Picture

The XState architecture in Gatsby's develop server is a masterclass in reactive system design. By modeling the development lifecycle as explicit states and transitions, Gatsby achieves:

  • Correctness: Events are never "lost"—they're either handled immediately or queued for the appropriate state
  • Observability: Every transition is traceable (verbose mode logs them via logTransitions)
  • Resilience: Errors in any state transition to waiting with an error log, rather than crashing

In the next article, we'll dive into the data layer that flows through these state machines—Redux as the central state store, LMDB as the persistent node database, and the GraphQL schema construction pipeline that turns raw data into a queryable API.