Read OSS

Docusaurus Architecture: A Map of the Monorepo

Intermediate

Prerequisites

  • Basic React knowledge (components, hooks, context)
  • Familiarity with Node.js and npm/yarn workspaces
  • General understanding of static site generators

Docusaurus Architecture: A Map of the Monorepo

Docusaurus powers tens of thousands of documentation sites — from React Native to Jest to Supabase — yet very few developers have ever looked inside the machine. Under the hood, it's a 40-package Yarn workspaces monorepo managed by Lerna, with a clean architectural split between server-side Node.js orchestration and client-side React rendering. Understanding this split is the key to reading every line of code that follows.

This article gives you the mental model you need. We'll walk through the package taxonomy, trace the two execution worlds, dissect the CLI, and follow the core loadSite() pipeline from config to generated code. By the end, you'll know exactly where to look when you want to understand any Docusaurus behavior.

Monorepo Layout and Package Categories

The root package.json declares Yarn v1 workspaces spanning packages/*, website, and several other directories. Lerna (lerna.json) orchestrates versioning and publishing at a unified version (currently 3.9.2).

The ~40 packages break down into clear categories:

Category Examples Role
Core docusaurus CLI, server pipeline, client app, SSG
Bundler docusaurus-bundler Webpack/Rspack abstraction
Content plugins plugin-content-docs, plugin-content-blog, plugin-content-pages Read files, produce routes
Themes theme-classic, theme-common React UI components
Preset preset-classic Bundles plugins + themes
MDX mdx-loader Webpack loader for MDX compilation
Utilities utils, utils-common, utils-validation Shared helpers
Types docusaurus-types TypeScript type definitions
Scaffolding create-docusaurus Project initializer
Logger docusaurus-logger Structured logging
graph TD
    subgraph "Preset Classic"
        PC[preset-classic]
    end
    subgraph "Content Plugins"
        DOCS[plugin-content-docs]
        BLOG[plugin-content-blog]
        PAGES[plugin-content-pages]
    end
    subgraph "Theme Layer"
        TC[theme-classic]
        TCM[theme-common]
    end
    subgraph "Core"
        CORE[docusaurus]
        BUNDLER[docusaurus-bundler]
        MDX[mdx-loader]
    end
    PC --> DOCS
    PC --> BLOG
    PC --> PAGES
    PC --> TC
    TC --> TCM
    DOCS --> MDX
    BLOG --> MDX
    CORE --> BUNDLER

The docusaurus core package is by far the largest. It contains the CLI, the entire server-side pipeline, the client React app, the SSG engine, and the webpack configuration layer. Think of it as the kernel — everything else plugs into it.

Tip: When navigating the codebase, start at packages/docusaurus/src/. The server-side code lives in server/ and commands/, while client-side code lives in client/. This is the most important directory split to internalize.

The Two Worlds: Server-Side and Client-Side

Docusaurus has a fundamental architectural split that you must understand before reading any code: server-side (Node.js) and client-side (React in the browser) are separate codebases that communicate through generated files.

flowchart LR
    subgraph "Server World (Node.js)"
        CONFIG[Config Loading]
        PLUGINS[Plugin Lifecycle]
        CODEGEN[Code Generation]
    end
    subgraph ".docusaurus/"
        GEN[Generated Files]
    end
    subgraph "Client World (React)"
        APP[App Component]
        ROUTES[Routes]
        HYDRATION[Hydration]
    end
    CONFIG --> PLUGINS --> CODEGEN --> GEN
    GEN --> APP
    GEN --> ROUTES
    ROUTES --> HYDRATION

Server-side code runs during docusaurus build and docusaurus start. It reads the config file, executes the plugin lifecycle, generates route manifests, and produces static HTML via SSG. This code lives in packages/docusaurus/src/server/ and packages/docusaurus/src/commands/.

Client-side code is a React application that hydrates in the browser. It uses React Router for navigation, lazy-loads route components, and manages theme context. This code lives in packages/docusaurus/src/client/.

The bridge between them is the .docusaurus/ directory — a generated folder containing JavaScript modules, JSON data, and route configurations that the client-side webpack build consumes via @generated/* aliases. The server writes these files; the client imports them.

The CLI as Orchestrator

The CLI entry point at packages/docusaurus/src/commands/cli.ts uses Commander.js to define all commands. The runCLI() function creates the program and parses arguments:

flowchart TD
    CLI[runCLI] --> BUILD[build]
    CLI --> START[start]
    CLI --> SWIZZLE[swizzle]
    CLI --> DEPLOY[deploy]
    CLI --> SERVE[serve]
    CLI --> CLEAR[clear]
    CLI --> WT[write-translations]
    CLI --> WHI[write-heading-ids]
    CLI --> EXT{External?}
    EXT -->|Yes| PLUGIN_CMD[Plugin CLI Extensions]

Each command maps to a dedicated module: build triggers the full static build pipeline, start launches the dev server with hot reload, and swizzle handles theme component customization.

One subtle detail: the CLI checks whether a command is "internal" at cli.ts#L26-L40. If the command isn't recognized, it calls externalCommand() before parsing — this is how plugins can register their own CLI commands. The docs plugin, for instance, adds docs:version for creating documentation snapshots.

Note the environment variable escape hatches at lines 53-56: DOCUSAURUS_CLI_SITE_DIR and DOCUSAURUS_CLI_CONFIG let you override the site directory and config path without passing CLI arguments. This exists because Commander.js can't determine the site directory before parsing, creating a chicken-and-egg problem for plugin CLI extensions that need config context.

The loadSite() Pipeline

The loadSite() function in packages/docusaurus/src/server/site.ts#L276-L298 is the single most important function in the codebase. Every command that needs site data — build, start, deploy — calls it. Here's what it does:

sequenceDiagram
    participant CLI as CLI Command
    participant LS as loadSite()
    participant LC as loadContext()
    participant LP as loadPlugins()
    participant CSP as createSiteProps()
    participant CSF as createSiteFiles()

    CLI->>LS: loadSite(params)
    LS->>LC: Load config, i18n, bundler
    LC-->>LS: LoadContext
    LS->>LP: Run plugin lifecycle (4 phases)
    LP-->>LS: plugins, routes, globalData
    LS->>CSP: Merge routes, metadata, translations
    CSP-->>LS: Props
    LS->>CSF: Generate .docusaurus/ files
    CSF-->>LS: Site ready

The pipeline has four stages:

  1. loadContext() (lines 81-173): Loads the site config, resolves i18n locale settings, determines the output directory, initializes the bundler (Webpack or Rspack), and loads code translations.

  2. loadPlugins(): Runs the full 4-phase plugin lifecycle — initialization, loadContent(), contentLoaded(), and allContentLoaded(). Returns loaded plugins, routes, and global data. We'll cover this in detail in Article 2.

  3. createSiteProps() (lines 175-230): Merges plugin results into a unified Props object containing routes, metadata, HTML tags, and code translations. Also handles duplicate route detection.

  4. createSiteFiles() (lines 233-268): Writes the .docusaurus/ directory by calling generateSiteFiles().

Tip: The Props type is the "contract" between the server pipeline and everything downstream (code generation, bundler config, dev server). If you're debugging any build issue, check what's in Props first.

The .docusaurus/ Bridge: Code Generation

The generated .docusaurus/ directory is the contract between server and client. The generateSiteFiles() function at packages/docusaurus/src/server/codegen/codegen.ts#L162-L174 writes all files in parallel:

Generated File Purpose
docusaurus.config.mjs Serialized site config for client access
routes.js React Router route tree with ComponentCreator lazy loading
registry.js Chunk name → module path mapping for code splitting
routesChunkNames.json Route path → chunk names for each route's modules
client-modules.js Plugin client modules (CSS, JS side effects)
globalData.json Cross-plugin data accessible via useGlobalData()
i18n.json Current locale configuration
codeTranslations.json UI string translations
site-metadata.json Plugin versions and site metadata
flowchart TD
    GEN[generateSiteFiles] --> WARN[DONT-EDIT-THIS-FOLDER]
    GEN --> CM[client-modules.js]
    GEN --> SC[docusaurus.config.mjs]
    GEN --> RF[routes.js + registry.js + routesChunkNames.json]
    GEN --> GD[globalData.json]
    GEN --> SM[site-metadata.json]
    GEN --> I18N[i18n.json]
    GEN --> CT[codeTranslations.json]

One design decision worth noting: client modules use require() instead of import(). Look at codegen.ts#L68-L77 — the comment explains that import() is async but client modules can include CSS, and the load order matters for CSS specificity. Using synchronous require() ensures CSS files are included in the correct order in the bundle.

The route generation in codegenRoutes.ts deserves special attention. It produces three files: routes.js contains a minimal React Router config using ComponentCreator for lazy loading, registry.js maps chunk names to dynamic import() calls with webpack magic comments for chunk naming, and routesChunkNames.json connects route paths to their chunk names. This three-file system enables aggressive code splitting — each page only loads the JavaScript it needs.

Client Component Tree and Routing

The client React application is assembled in packages/docusaurus/src/client/App.tsx. The component tree is a Russian doll of providers:

graph TD
    EB[ErrorBoundary] --> DCP[DocusaurusContextProvider]
    DCP --> BCP[BrowserContextProvider]
    BCP --> ROOT["Root (@theme/Root)"]
    ROOT --> TP["ThemeProvider (@theme/ThemeProvider)"]
    TP --> SMD[SiteMetadataDefaults]
    TP --> SM["SiteMetadata (@theme/SiteMetadata)"]
    TP --> BIB[BaseUrlIssueBanner]
    TP --> AN[AppNavigation]
    AN --> PN[PendingNavigation]
    PN --> ROUTES["renderRoutes(@generated/routes)"]

Notice that @theme/Root and @theme/ThemeProvider are resolved through the theme alias system — they point to either theme-classic's implementations or user-swizzled versions. The @generated/routes import connects to the server-generated route file.

The browser entry point at clientEntry.tsx handles both hydration and client-side rendering. It preloads route data for the current path before rendering, then either calls ReactDOM.hydrateRoot() (for SSG'd pages) or ReactDOM.createRoot() (for dev mode). The router choice between BrowserRouter and HashRouter is driven by the future.experimental_router config option.

For SSR/SSG, serverEntry.tsx wraps the same <App /> with StaticRouter, HelmetProvider, and a BrokenLinksProvider that collects all links and anchors on the page for post-build validation. It renders to HTML and returns the collected metadata alongside the markup.

What's Next

You now have the map. You understand the monorepo structure, the server/client split, the CLI command dispatch, the loadSite() pipeline, the .docusaurus/ bridge, and the client component tree. In the next article, we'll zoom into the heart of the server pipeline: the 4-phase plugin lifecycle that transforms content on disk into React routes in the browser.