Read OSS

Inside Babel's Architecture: The Four-Phase Compilation Pipeline

Intermediate

Prerequisites

  • Familiarity with Abstract Syntax Trees (ASTs)
  • Basic JavaScript tooling concepts (transpilers, bundlers)
  • Understanding of monorepo structures (Yarn workspaces)

Inside Babel's Architecture: The Four-Phase Compilation Pipeline

Every JavaScript developer has used Babel, but few have looked inside. At its core, Babel is a compiler — one that reads JavaScript (or TypeScript, or Flow), transforms it, and writes JavaScript back out. What makes it interesting as a codebase is that this compiler is split across 149 packages in a Yarn workspaces monorepo, with a plugin system that has allowed an entire ecosystem of language transformations to emerge. In this first article, we'll map the monorepo, trace a call from transformSync() to output code, and understand the design decisions that make Babel's architecture distinctive.

Monorepo Layout: 149 Packages at a Glance

Babel's packages fall into five categories. Understanding these categories is the key to navigating the codebase without drowning in its scale.

Category Examples Count (approx.)
Core babel-core, babel-parser, babel-traverse, babel-generator, babel-types 5
Plugins babel-plugin-transform-arrow-functions, babel-plugin-transform-classes ~70
Helpers babel-helpers, babel-helper-plugin-utils, babel-helper-compilation-targets ~20
Presets babel-preset-env, babel-preset-typescript, babel-preset-flow ~5
Tooling babel-cli, babel-loader, babel-register, babel-eslint-* ~15+

The five core packages form a strict pipeline. Each one has a single, focused responsibility:

graph LR
    A["@babel/core<br/>Orchestration"] --> B["@babel/parser<br/>Source → AST"]
    A --> C["@babel/traverse<br/>Walk + Transform AST"]
    A --> D["@babel/generator<br/>AST → Source"]
    A --> E["@babel/types<br/>AST Node Definitions"]
    B --> E
    C --> E
    D --> E

The public API surface lives in packages/babel-core/src/index.ts. It re-exports everything a consumer needs: transform, transformSync, transformAsync, parse, loadOptions, and even the sub-packages themselves (types, traverse, template). This is intentional — @babel/core is the single entry point for programmatic usage.

Tip: If you're exploring Babel's source for the first time, start at packages/babel-core/src/index.ts. It's only 109 lines and gives you a complete map of the public API and which internal modules implement each export.

The Four-Phase Pipeline: Config → Parse → Transform → Generate

When you call transformSync(code, opts), Babel runs four distinct phases. Each phase is cleanly separated, and the boundaries between them are enforced by the module structure.

flowchart LR
    A["transformSync(code, opts)"] --> B["Phase 1: Config<br/>Load & merge configuration"]
    B --> C["Phase 2: Parse<br/>Source code → AST"]
    C --> D["Phase 3: Transform<br/>Run plugin visitors on AST"]
    D --> E["Phase 4: Generate<br/>AST → output code + source map"]
    E --> F["FileResult"]

The entry point is packages/babel-core/src/transform.ts. Here's the heart of it — a gensync generator that's only nine lines:

const transformRunner = gensync(function* transform(
  code: string,
  opts?: InputOptions | null,
): Handler<FileResult | null> {
  const config: ResolvedConfig | null = yield* loadConfig(opts);
  if (config === null) return null;
  return yield* run(config, code);
});

Two function calls. loadConfig resolves all configuration — babel.config.js, .babelrc files, presets, plugins, overrides, environment-specific settings — into a single ResolvedConfig object. Then run takes that config and the source code and produces a FileResult containing the transformed code, source map, AST (optionally), and metadata.

The three public variants — transform (callback), transformSync, and transformAsync — are all derived from this single generator via gensync's .errback, .sync, and .async adapters (lines 31–65). This pattern is one of Babel's most distinctive architectural choices, and we'll explore it in detail later.

The run() Function: Pipeline Orchestrator

The run() function in packages/babel-core/src/transformation/index.ts#L36-L81 is where the last three phases execute in sequence. It's worth reading in full because it's the single function that coordinates the entire compilation:

sequenceDiagram
    participant T as transformSync
    participant R as run()
    participant N as normalizeFile()
    participant TF as transformFile()
    participant G as generateCode()

    T->>R: config + code
    R->>N: pluginPasses + options + code
    N-->>R: File object (with AST)
    R->>TF: File + pluginPasses
    TF-->>R: (AST mutated in-place)
    R->>G: pluginPasses + File
    G-->>R: outputCode + outputMap
    R-->>T: FileResult

The function body is clean and well-structured:

  1. Parse: normalizeFile(config.passes, normalizeOptions(config), code, ast) — parses the code into an AST and wraps everything in a File object. If an AST is already provided (via transformFromAst), it's used directly.

  2. Transform: transformFile(file, config.passes) — runs all plugin visitors across the AST. This mutates the AST in-place.

  3. Generate: generateCode(config.passes, file) — turns the (now transformed) AST back into a string, with optional source maps.

Error handling here is notable. Both the transform and generate phases wrap their errors with the filename and a specific error code (BABEL_TRANSFORM_ERROR or BABEL_GENERATE_ERROR), making it easy to distinguish where in the pipeline a failure occurred.

normalizeFile: Bridging Config to AST

The normalizeFile function handles parsing and source map extraction. Its job is to produce a File object from raw source code.

The parsing itself is a single line: ast = yield* parser(pluginPasses, options, code). But normalizeFile does more than just parse — it also extracts inline and external source maps from the input code. It uses regex patterns to find sourceMappingURL comments (both inline base64-encoded and external file references), strips them from the AST's comments, and passes them along as inputMap for later source map composition.

This design means that Babel can chain with other tools' source maps transparently. If your TypeScript compiler emits an inline source map, Babel will pick it up and compose its own transformations on top.

The File Object: Compilation Unit

The File class is the central data structure that flows through the entire pipeline. It carries everything a plugin might need:

classDiagram
    class File {
        +opts: ResolvedOptions
        +ast: t.File
        +code: string
        +path: NodePath~Program~
        +scope: Scope
        +metadata: Record
        +inputMap: SourceMapConverter
        +hub: HubInterface
        +declarations: Record~string, Identifier~
        +addHelper(name): Identifier
        +buildCodeFrameError(node, msg): Error
    }
    class HubInterface {
        +file: File
        +getCode(): string
        +getScope(): Scope
        +addHelper(name): Identifier
        +buildError(node, msg): Error
    }
    File --> HubInterface : hub

The hub object is a bridge between the File and the traversal system. When a plugin calls path.hub.file.opts.filename or this.addHelper("classCallCheck"), they're reaching through the hub into the File. The hub provides getCode, getScope, addHelper, and buildError — the essential services plugins need during transformation.

The addHelper method (lines 130–184) deserves special attention. It injects runtime helper functions into the output, handling dependency resolution between helpers and collision-free naming via scope.generateUidIdentifier. We'll explore this in detail in Article 5.

Tip: The File class also has a general-purpose _map (a Map<unknown, unknown>) exposed via get/set/has methods. Plugins use this to share state across visitor methods — for example, to track whether a particular transformation has already been applied to the current file.

The transformFile Function: Plugin Execution Engine

The transformFile function is where plugins actually run. It iterates over each "pass" (a group of plugins), and for each pass:

  1. Creates a PluginPass instance for each plugin — this becomes the this context inside visitor methods
  2. Calls each plugin's pre() hook
  3. Merges all plugin visitors into a single visitor using traverse.visitors.merge()
  4. Runs the merged visitor over the AST with a single traverse() call
  5. Calls each plugin's post() hook

The critical insight is step 3: within a single pass, all plugins share one traversal. Rather than walking the tree once per plugin (which would be O(plugins × nodes)), Babel merges all visitors so each node is visited once, with each plugin's handler called in order. This is a major performance optimization for large codebases with many plugins.

The block-hoist plugin is silently appended to every pass (pluginPairs.concat([loadBlockHoistPlugin()])). It's an internal plugin that reorders statements by their _blockHoist priority after all other plugins have run.

Sync/Async Unification with gensync

Babel's most unconventional design choice is its use of gensync — a library that uses JavaScript generators to write code that can run both synchronously and asynchronously. This is what allows transformSync, transformAsync, and transform (callback-style) to share the exact same implementation.

flowchart TD
    A["gensync generator function*"] --> B["transformRunner.sync()"]
    A --> C["transformRunner.async()"]
    A --> D["transformRunner.errback()"]
    B --> E["transformSync()"]
    C --> F["transformAsync()"]
    D --> G["transform(code, opts, callback)"]

The key utility functions live in packages/babel-core/src/gensync-utils/async.ts. Two are particularly important:

  • isAsync (lines 15–18): A gensync generator that returns false in sync mode and true in async mode. The transformFile function uses this to know whether it can handle async plugin hooks.

  • maybeAsync (lines 25–38): Wraps a function that might return a promise. In sync mode, if the function returns a thenable, it throws an error ("You appear to be using an async plugin, but Babel has been called synchronously"). In async mode, it wraps the result in Promise.resolve.

Without gensync, Babel would need to maintain two parallel implementations of every function in the config loading and transformation pipeline — one sync, one async. The generator approach eliminates that duplication at the cost of some cognitive overhead for contributors who haven't seen the pattern before.

The Code Generation Phase

The final phase is generateCode, which is surprisingly simple. It checks whether any plugin provides a generatorOverride, and if not, delegates to @babel/generator's generate() function. It handles three source map modes: "inline" (appended as a comment), "both" (inline and returned), and the default (returned as a separate object).

The generatorOverride mechanism is how tools like @babel/plugin-transform-runtime can customize code output, though in practice only one plugin can override generation — if multiple plugins try, Babel throws an error.

What's Next

We've now traced the full path from transformSync() to a FileResult. We know how config is loaded, how the code becomes an AST, how plugins run against that AST, and how the AST becomes output code. But we've only scratched the surface of each phase.

In the next article, we'll dive deep into @babel/parser — the recursive descent parser that turns source code into an AST. We'll explore its eight-level class inheritance chain, the tokenizer that drives parsing, and the mixin plugin system that lets Babel parse TypeScript, Flow, and JSX without a single runtime hook.