From Nodes to Queries: Redux, LMDB, and GraphQL Schema Construction
Prerequisites
- ›Articles 1-3 of this series
- ›Redux fundamentals (store, actions, reducers)
- ›GraphQL schema concepts (types, resolvers, directives)
- ›Basic understanding of memory-mapped databases
From Nodes to Queries: Redux, LMDB, and GraphQL Schema Construction
Every piece of content that flows through a Gatsby site—every Markdown file, every CMS entry, every image—passes through a data layer that transforms raw data into a typed, queryable GraphQL schema. This data layer is the intellectual center of Gatsby's architecture, and it's built on three pillars: Redux for global state management, LMDB for persistent node storage, and graphql-compose for schema construction.
As we saw in Parts 2 and 3, both the build pipeline and the develop state machine call the same service functions—sourceNodes, buildSchema, createPages. This article explains what happens inside those services: how data enters the system, how it's stored, how the schema is built from it, and how queries are extracted and executed against it.
Redux as the Central Nervous System
The Redux store in packages/gatsby/src/redux/index.ts is the single source of truth for the entire build process. It tracks everything: pages, nodes, components, queries, webpack compilation hashes, HTML file states, and more.
The IGatsbyState Shape
The IGatsbyState interface defines the complete state shape. Here's a curated view of its key members:
graph TD
subgraph "IGatsbyState"
nodes["nodes: Map<string, IGatsbyNode>"]
pages["pages: Map<string, IGatsbyPage>"]
components["components: Map<string, IGatsbyPageComponent>"]
schema["schema: GraphQLSchema"]
queries["queries: { trackedQueries, trackedComponents, ... }"]
html["html: { trackedHtmlFiles, compilationHashes, ... }"]
flattenedPlugins["flattenedPlugins: Array<FlattenedPlugin>"]
config["config: IGatsbyConfig"]
status["status: { PLUGINS_HASH, LAST_NODE_COUNTER }"]
jobsV2["jobsV2: { incomplete, complete, jobsByRequest }"]
end
| State Slice | Type | Purpose |
|---|---|---|
nodes |
Map<string, IGatsbyNode> |
All content nodes in the system |
pages |
Map<string, IGatsbyPage> |
Registered pages with paths and components |
components |
Map<string, IGatsbyPageComponent> |
Page templates with query and rendering metadata |
schema |
GraphQLSchema |
The compiled GraphQL schema |
queries |
Complex object | Query tracking: dirty flags, dependency graphs |
html |
Complex object | HTML file states, compilation hashes |
flattenedPlugins |
Array | The canonical plugin registry |
Three Tiers of Actions
Gatsby's Redux actions are organized into three tiers of access in packages/gatsby/src/redux/actions/public.js:
- Public actions (
actions/public.js): Available to all plugins —createNode,createPage,createRedirect,deleteNode - Restricted actions (
actions/restricted.ts): Available only to specific APIs —createTypes,addThirdPartySchema,setWebpackConfig - Internal actions (
actions/internal.ts): Framework-only —SET_PROGRAM,SET_SITE_CONFIG,SET_SCHEMA
This tiering is enforced by the API runner, which only binds the appropriate action creators for each API hook. A plugin implementing sourceNodes gets createNode but not setWebpackConfig.
Store Configuration and Persistence
The store is configured with two middleware layers (lines 101–115): redux-thunk for async actions and a custom multi middleware that handles arrays of actions (dispatching each element individually).
At line 117–119, the initial state is loaded conditionally:
export const store: GatsbyReduxStore = configureStore(
process.env.GATSBY_WORKER_POOL_WORKER ? ({} as IGatsbyState) : readState()
)
Workers get an empty state (they receive partial state from the main process), while the main process reads from the LMDB cache—enabling incremental builds.
The mett Event Bridge
A subtle but critical piece of glue connects Redux to the plugin system. The mett module is a lightweight event emitter (inspired by mitt) that uses Map<string, Set<Handler>> instead of plain objects and arrays.
At lines 172–175 of redux/index.ts, every Redux action is broadcast via mett:
store.subscribe(() => {
const lastAction = store.getState().lastAction
emitter.emit(lastAction.type, lastAction)
})
This creates a pub/sub bridge: any part of the system can listen for specific Redux actions. The plugin runner (packages/gatsby/src/redux/plugin-runner.ts) uses this bridge to auto-trigger plugin hooks:
sequenceDiagram
participant Plugin as Source Plugin
participant Redux as Redux Store
participant Mett as mett emitter
participant Runner as plugin-runner.ts
participant OnCreate as onCreateNode plugins
Plugin->>Redux: createNode(fileNode)
Redux->>Redux: Reduce CREATE_NODE
Redux->>Mett: emit("CREATE_NODE", action)
Mett->>Runner: CREATE_NODE handler
Runner->>Runner: Check: is node.internal.type === "SitePage"?
Runner->>OnCreate: apiRunnerNode("onCreateNode", { node })
The startPluginRunner function at lines 44–77 pre-filters plugins at startup—it only registers emitter listeners if at least one plugin implements onCreatePage or onCreateNode. This avoids the overhead of firing events that nobody is listening for.
Tip: The mett emitter also supports wildcard listeners via the
*event name. This is how the develop state machine's mutation listener captures all node mutations regardless of action type.
Node Storage: From Redux to LMDB
Gatsby originally stored all nodes in Redux's in-memory state. For large sites (100K+ nodes), this consumed gigabytes of RAM. The solution was LMDB—a memory-mapped B-tree database that provides near-memory-speed reads with disk-backed persistence.
The entry point is the lazy-loading pattern in packages/gatsby/src/datastore/datastore.ts:
let dataStore: IDataStore
export function getDataStore(): IDataStore {
if (!dataStore) {
const { setupLmdbStore } = require(`./lmdb/lmdb-datastore`)
dataStore = setupLmdbStore()
}
return dataStore
}
The LMDB implementation in packages/gatsby/src/datastore/lmdb/lmdb-datastore.ts uses globalThis.__GATSBY_OPEN_ROOT_LMDBS to share database handles across require contexts:
function getRootDb(): RootDatabase {
if (!rootDb) {
if (!globalThis.__GATSBY_OPEN_ROOT_LMDBS) {
globalThis.__GATSBY_OPEN_ROOT_LMDBS = new Map()
}
rootDb = globalThis.__GATSBY_OPEN_ROOT_LMDBS.get(fullDbPath)
if (rootDb) return rootDb
rootDb = open({
name: `root`,
path: fullDbPath,
compression: true,
})
globalThis.__GATSBY_OPEN_ROOT_LMDBS.set(fullDbPath, rootDb)
}
return rootDb
}
This globalThis caching prevents the "multiple LMDB instances" problem that causes random errors when the same database is opened twice in the same process (which can happen in gatsby serve where both the engine and the trailing-slash middleware need access to nodes).
flowchart TD
A["createNode() action"] --> B["Redux Reducer"]
B --> C["LMDB updateNodes"]
C --> D[".cache/data/datastore"]
E["getNode(id)"] --> F["LMDB getNode"]
F --> D
G["getNodesByType(type)"] --> H["LMDB iterateNodesByType"]
H --> D
style D fill:#fff3e0
The database path defaults to .cache/data/datastore for production and .cache/data/test-datastore-{workerId} for tests—ensuring test isolation across Jest workers (lines 32–44).
GraphQL Schema: Inference Meets Customization
Gatsby's GraphQL schema is built in two phases: customization (explicit type definitions from plugins) and inference (automatic type generation from node data). The orchestrator lives in packages/gatsby/src/schema/index.js.
Phase 1: Customization
During the customizeSchema service, plugins call createTypes() to define explicit GraphQL types. These type definitions are stored in store.getState().schemaCustomization.types. Built-in types are added first, then plugin types, then user types—ensuring user definitions take priority (lines 26–34 of schema/index.js):
return [
...builtInTypes,
...types.filter(type => type.plugin && type.plugin.name !== `default-site-plugin`),
...types.filter(type => !type.plugin || type.plugin.name === `default-site-plugin`),
]
Phase 2: Inference
After explicit types are registered, buildInferenceMetadata (lines 53–80) loops through all node types, examines their data, and dispatches BUILD_TYPE_METADATA actions that feed the inference engine. The schema builder in packages/gatsby/src/schema/schema.js then uses graphql-compose to merge explicit definitions with inferred types.
Schema Extensions
The extension system in packages/gatsby/src/schema/extensions/index.js provides type-level and field-level directives:
| Extension | Level | Purpose |
|---|---|---|
@infer |
Type | Enable automatic field inference (default) |
@dontInfer |
Type | Disable inference; only explicit fields |
@link |
Field | Create foreign-key relationship to another node |
@dateformat |
Field | Add date formatting arguments to date fields |
@fileByRelativePath |
Field | Resolve relative file paths to File nodes |
@mimeTypes |
Type | Define which MIME types a type handles |
@childOf |
Type | Declare parent-child relationships |
flowchart TD
A["Plugins call createTypes()"] --> B["Explicit type definitions"]
C["Node data in LMDB"] --> D["Inference engine"]
B --> E["graphql-compose SchemaComposer"]
D --> E
F["Schema extensions<br/>@infer @link @dateformat"] --> E
E --> G["Final GraphQLSchema"]
G --> H["Store: SET_SCHEMA"]
Tip: If you're building a source plugin and want complete control over your type's schema, use
@dontInferon your type definition. This prevents Gatsby from analyzing node data and potentially adding unwanted fields that could break if your data shape changes.
The Query Pipeline
Once the schema is built, queries need to be extracted from component files, compiled, validated, and executed. This pipeline touches several files across the query/ directory.
Extraction
The query compiler in packages/gatsby/src/query/query-compiler.js uses a Babel-based FileParser to find GraphQL tagged template literals in component files. It searches all files in the project and theme directories:
const parsedQueries = await parseQueries({
base: program.directory,
additional: resolveThemes(
flattenedPlugins.map(plugin => ({
themeDir: plugin.pluginFilepath,
}))
),
addError,
parentSpan: activity.span,
})
The compiler validates extracted queries against the built schema using standard GraphQL validation rules (imported from the graphql package at lines 14–29), then collocates fragments—since fragments have global scope in Gatsby, a fragment defined in one file can be used in any query.
Execution
The GraphQLRunner class wraps the standard graphql execute function with caching and tracing. At construction time, it creates a LocalNodeModel—the resolver context that all Gatsby field resolvers receive:
this.nodeModel = new LocalNodeModel({
schema,
schemaComposer: schemaCustomization.composer,
createPageDependency,
_rootNodeMap,
_trackedRootNodes,
})
The NodeModel is what makes Gatsby's resolvers "smart"—it tracks which nodes each query depends on (via createPageDependency), enabling automatic query invalidation when source data changes.
Three Query Types
Gatsby processes three distinct types of queries:
- Page queries: Defined in page components, receive
pageContextvariables. Run once per page. - Static queries (
useStaticQuery): Defined anywhere, no variables. Results are embedded in the JS bundle. - Slice queries: Defined in slice components (Gatsby 5 feature). Run once per slice, shared across pages.
flowchart LR
A["Component files"] -->|"Babel parse"| B["FileParser"]
B --> C["Raw queries + fragments"]
C -->|"Fragment collocation"| D["Complete queries"]
D -->|"Validate against schema"| E["Valid queries"]
E -->|"calculateDirtyQueries"| F["Dirty query IDs"]
F --> G["GraphQLRunner.execute()"]
G -->|"Page queries"| H["page-data JSON files"]
G -->|"Static queries"| I["static-query JSON files"]
G -->|"Slice queries"| J["slice-data JSON files"]
The calculateDirtyQueries step is what enables incremental builds—it compares query hashes and node dependencies against the previous build to determine which queries actually need re-execution. For a site with 10,000 pages where only 3 nodes changed, this can reduce query execution from minutes to seconds.
What's Next
We've now traced data from raw content through node creation, LMDB storage, schema inference, and query execution. In the final article, we'll explore Gatsby's extensibility surface—the plugin system that connects all of this to the outside world, the theme system with its component shadowing, the SSG/DSG/SSR page mode system, and the deployment adapter abstraction.