Read OSS

From Nodes to Queries: Redux, LMDB, and GraphQL Schema Construction

Advanced

Prerequisites

  • Articles 1-3 of this series
  • Redux fundamentals (store, actions, reducers)
  • GraphQL schema concepts (types, resolvers, directives)
  • Basic understanding of memory-mapped databases

From Nodes to Queries: Redux, LMDB, and GraphQL Schema Construction

Every piece of content that flows through a Gatsby site—every Markdown file, every CMS entry, every image—passes through a data layer that transforms raw data into a typed, queryable GraphQL schema. This data layer is the intellectual center of Gatsby's architecture, and it's built on three pillars: Redux for global state management, LMDB for persistent node storage, and graphql-compose for schema construction.

As we saw in Parts 2 and 3, both the build pipeline and the develop state machine call the same service functions—sourceNodes, buildSchema, createPages. This article explains what happens inside those services: how data enters the system, how it's stored, how the schema is built from it, and how queries are extracted and executed against it.

Redux as the Central Nervous System

The Redux store in packages/gatsby/src/redux/index.ts is the single source of truth for the entire build process. It tracks everything: pages, nodes, components, queries, webpack compilation hashes, HTML file states, and more.

The IGatsbyState Shape

The IGatsbyState interface defines the complete state shape. Here's a curated view of its key members:

graph TD
    subgraph "IGatsbyState"
        nodes["nodes: Map<string, IGatsbyNode>"]
        pages["pages: Map<string, IGatsbyPage>"]
        components["components: Map<string, IGatsbyPageComponent>"]
        schema["schema: GraphQLSchema"]
        queries["queries: { trackedQueries, trackedComponents, ... }"]
        html["html: { trackedHtmlFiles, compilationHashes, ... }"]
        flattenedPlugins["flattenedPlugins: Array<FlattenedPlugin>"]
        config["config: IGatsbyConfig"]
        status["status: { PLUGINS_HASH, LAST_NODE_COUNTER }"]
        jobsV2["jobsV2: { incomplete, complete, jobsByRequest }"]
    end
State Slice Type Purpose
nodes Map<string, IGatsbyNode> All content nodes in the system
pages Map<string, IGatsbyPage> Registered pages with paths and components
components Map<string, IGatsbyPageComponent> Page templates with query and rendering metadata
schema GraphQLSchema The compiled GraphQL schema
queries Complex object Query tracking: dirty flags, dependency graphs
html Complex object HTML file states, compilation hashes
flattenedPlugins Array The canonical plugin registry

Three Tiers of Actions

Gatsby's Redux actions are organized into three tiers of access in packages/gatsby/src/redux/actions/public.js:

  1. Public actions (actions/public.js): Available to all plugins — createNode, createPage, createRedirect, deleteNode
  2. Restricted actions (actions/restricted.ts): Available only to specific APIs — createTypes, addThirdPartySchema, setWebpackConfig
  3. Internal actions (actions/internal.ts): Framework-only — SET_PROGRAM, SET_SITE_CONFIG, SET_SCHEMA

This tiering is enforced by the API runner, which only binds the appropriate action creators for each API hook. A plugin implementing sourceNodes gets createNode but not setWebpackConfig.

Store Configuration and Persistence

The store is configured with two middleware layers (lines 101–115): redux-thunk for async actions and a custom multi middleware that handles arrays of actions (dispatching each element individually).

At line 117–119, the initial state is loaded conditionally:

export const store: GatsbyReduxStore = configureStore(
  process.env.GATSBY_WORKER_POOL_WORKER ? ({} as IGatsbyState) : readState()
)

Workers get an empty state (they receive partial state from the main process), while the main process reads from the LMDB cache—enabling incremental builds.

The mett Event Bridge

A subtle but critical piece of glue connects Redux to the plugin system. The mett module is a lightweight event emitter (inspired by mitt) that uses Map<string, Set<Handler>> instead of plain objects and arrays.

At lines 172–175 of redux/index.ts, every Redux action is broadcast via mett:

store.subscribe(() => {
  const lastAction = store.getState().lastAction
  emitter.emit(lastAction.type, lastAction)
})

This creates a pub/sub bridge: any part of the system can listen for specific Redux actions. The plugin runner (packages/gatsby/src/redux/plugin-runner.ts) uses this bridge to auto-trigger plugin hooks:

sequenceDiagram
    participant Plugin as Source Plugin
    participant Redux as Redux Store
    participant Mett as mett emitter
    participant Runner as plugin-runner.ts
    participant OnCreate as onCreateNode plugins

    Plugin->>Redux: createNode(fileNode)
    Redux->>Redux: Reduce CREATE_NODE
    Redux->>Mett: emit("CREATE_NODE", action)
    Mett->>Runner: CREATE_NODE handler
    Runner->>Runner: Check: is node.internal.type === "SitePage"?
    Runner->>OnCreate: apiRunnerNode("onCreateNode", { node })

The startPluginRunner function at lines 44–77 pre-filters plugins at startup—it only registers emitter listeners if at least one plugin implements onCreatePage or onCreateNode. This avoids the overhead of firing events that nobody is listening for.

Tip: The mett emitter also supports wildcard listeners via the * event name. This is how the develop state machine's mutation listener captures all node mutations regardless of action type.

Node Storage: From Redux to LMDB

Gatsby originally stored all nodes in Redux's in-memory state. For large sites (100K+ nodes), this consumed gigabytes of RAM. The solution was LMDB—a memory-mapped B-tree database that provides near-memory-speed reads with disk-backed persistence.

The entry point is the lazy-loading pattern in packages/gatsby/src/datastore/datastore.ts:

let dataStore: IDataStore

export function getDataStore(): IDataStore {
  if (!dataStore) {
    const { setupLmdbStore } = require(`./lmdb/lmdb-datastore`)
    dataStore = setupLmdbStore()
  }
  return dataStore
}

The LMDB implementation in packages/gatsby/src/datastore/lmdb/lmdb-datastore.ts uses globalThis.__GATSBY_OPEN_ROOT_LMDBS to share database handles across require contexts:

function getRootDb(): RootDatabase {
  if (!rootDb) {
    if (!globalThis.__GATSBY_OPEN_ROOT_LMDBS) {
      globalThis.__GATSBY_OPEN_ROOT_LMDBS = new Map()
    }
    rootDb = globalThis.__GATSBY_OPEN_ROOT_LMDBS.get(fullDbPath)
    if (rootDb) return rootDb

    rootDb = open({
      name: `root`,
      path: fullDbPath,
      compression: true,
    })
    globalThis.__GATSBY_OPEN_ROOT_LMDBS.set(fullDbPath, rootDb)
  }
  return rootDb
}

This globalThis caching prevents the "multiple LMDB instances" problem that causes random errors when the same database is opened twice in the same process (which can happen in gatsby serve where both the engine and the trailing-slash middleware need access to nodes).

flowchart TD
    A["createNode() action"] --> B["Redux Reducer"]
    B --> C["LMDB updateNodes"]
    C --> D[".cache/data/datastore"]

    E["getNode(id)"] --> F["LMDB getNode"]
    F --> D

    G["getNodesByType(type)"] --> H["LMDB iterateNodesByType"]
    H --> D

    style D fill:#fff3e0

The database path defaults to .cache/data/datastore for production and .cache/data/test-datastore-{workerId} for tests—ensuring test isolation across Jest workers (lines 32–44).

GraphQL Schema: Inference Meets Customization

Gatsby's GraphQL schema is built in two phases: customization (explicit type definitions from plugins) and inference (automatic type generation from node data). The orchestrator lives in packages/gatsby/src/schema/index.js.

Phase 1: Customization

During the customizeSchema service, plugins call createTypes() to define explicit GraphQL types. These type definitions are stored in store.getState().schemaCustomization.types. Built-in types are added first, then plugin types, then user types—ensuring user definitions take priority (lines 26–34 of schema/index.js):

return [
  ...builtInTypes,
  ...types.filter(type => type.plugin && type.plugin.name !== `default-site-plugin`),
  ...types.filter(type => !type.plugin || type.plugin.name === `default-site-plugin`),
]

Phase 2: Inference

After explicit types are registered, buildInferenceMetadata (lines 53–80) loops through all node types, examines their data, and dispatches BUILD_TYPE_METADATA actions that feed the inference engine. The schema builder in packages/gatsby/src/schema/schema.js then uses graphql-compose to merge explicit definitions with inferred types.

Schema Extensions

The extension system in packages/gatsby/src/schema/extensions/index.js provides type-level and field-level directives:

Extension Level Purpose
@infer Type Enable automatic field inference (default)
@dontInfer Type Disable inference; only explicit fields
@link Field Create foreign-key relationship to another node
@dateformat Field Add date formatting arguments to date fields
@fileByRelativePath Field Resolve relative file paths to File nodes
@mimeTypes Type Define which MIME types a type handles
@childOf Type Declare parent-child relationships
flowchart TD
    A["Plugins call createTypes()"] --> B["Explicit type definitions"]
    C["Node data in LMDB"] --> D["Inference engine"]
    B --> E["graphql-compose SchemaComposer"]
    D --> E
    F["Schema extensions<br/>@infer @link @dateformat"] --> E
    E --> G["Final GraphQLSchema"]
    G --> H["Store: SET_SCHEMA"]

Tip: If you're building a source plugin and want complete control over your type's schema, use @dontInfer on your type definition. This prevents Gatsby from analyzing node data and potentially adding unwanted fields that could break if your data shape changes.

The Query Pipeline

Once the schema is built, queries need to be extracted from component files, compiled, validated, and executed. This pipeline touches several files across the query/ directory.

Extraction

The query compiler in packages/gatsby/src/query/query-compiler.js uses a Babel-based FileParser to find GraphQL tagged template literals in component files. It searches all files in the project and theme directories:

const parsedQueries = await parseQueries({
  base: program.directory,
  additional: resolveThemes(
    flattenedPlugins.map(plugin => ({
      themeDir: plugin.pluginFilepath,
    }))
  ),
  addError,
  parentSpan: activity.span,
})

The compiler validates extracted queries against the built schema using standard GraphQL validation rules (imported from the graphql package at lines 14–29), then collocates fragments—since fragments have global scope in Gatsby, a fragment defined in one file can be used in any query.

Execution

The GraphQLRunner class wraps the standard graphql execute function with caching and tracing. At construction time, it creates a LocalNodeModel—the resolver context that all Gatsby field resolvers receive:

this.nodeModel = new LocalNodeModel({
  schema,
  schemaComposer: schemaCustomization.composer,
  createPageDependency,
  _rootNodeMap,
  _trackedRootNodes,
})

The NodeModel is what makes Gatsby's resolvers "smart"—it tracks which nodes each query depends on (via createPageDependency), enabling automatic query invalidation when source data changes.

Three Query Types

Gatsby processes three distinct types of queries:

  1. Page queries: Defined in page components, receive pageContext variables. Run once per page.
  2. Static queries (useStaticQuery): Defined anywhere, no variables. Results are embedded in the JS bundle.
  3. Slice queries: Defined in slice components (Gatsby 5 feature). Run once per slice, shared across pages.
flowchart LR
    A["Component files"] -->|"Babel parse"| B["FileParser"]
    B --> C["Raw queries + fragments"]
    C -->|"Fragment collocation"| D["Complete queries"]
    D -->|"Validate against schema"| E["Valid queries"]
    E -->|"calculateDirtyQueries"| F["Dirty query IDs"]
    F --> G["GraphQLRunner.execute()"]
    G -->|"Page queries"| H["page-data JSON files"]
    G -->|"Static queries"| I["static-query JSON files"]
    G -->|"Slice queries"| J["slice-data JSON files"]

The calculateDirtyQueries step is what enables incremental builds—it compares query hashes and node dependencies against the previous build to determine which queries actually need re-execution. For a site with 10,000 pages where only 3 nodes changed, this can reduce query execution from minutes to seconds.

What's Next

We've now traced data from raw content through node creation, LMDB storage, schema inference, and query execution. In the final article, we'll explore Gatsby's extensibility surface—the plugin system that connects all of this to the outside world, the theme system with its component shadowing, the SSG/DSG/SSR page mode system, and the deployment adapter abstraction.