Read OSS

The Linker: Tree Shaking, Code Splitting, and Chunk Generation

Advanced

Prerequisites

  • Article 1: Architecture and Project Layout
  • Article 2: Scanning, Parsing, and Module Graph
  • Understanding of ES module import/export semantics
  • Familiarity with tree shaking concepts

The Linker: Tree Shaking, Code Splitting, and Chunk Generation

The linker is where the module graph becomes output files. It takes the immutable data produced by the scan phase and transforms it through a carefully ordered sequence of steps: cloning, import resolution, tree shaking, code splitting, and parallel code generation. This is the most complex part of esbuild, and the part where the most subtle correctness issues live.

In this article, we'll walk through each step of the Link() function and understand why the order matters.

Immutable Graph Cloning

The linker's design starts with an immutability contract, documented at the top of internal/graph/graph.go#L1-L14:

// This graph represents the set of files that the linker operates on. Each
// linker has a separate one of these graphs (there is one linker when code
// splitting is on, but one linker per entry point when code splitting is off).
//
// The input data to the linker constructor must be considered immutable because
// it's shared between linker invocations and is also stored in the cache for
// incremental builds.
//
// The linker constructor makes a shallow clone of the input data and is careful
// to pre-clone ahead of time the AST fields that it may modify.

This design enables two things:

  1. Watch mode: Unchanged files don't need to be re-parsed. The cached scan data is reused across rebuilds.
  2. Parallel linking: When code splitting is off, each entry point can be linked independently using its own clone.

CloneLinkerGraph() creates the linker's working copy, wrapping each InputFile in a LinkerFile that adds mutable linking metadata:

type LinkerFile struct {
    EntryBits helpers.BitSet   // Which entry points reach this file
    InputFile InputFile
    DistanceFromEntryPoint uint32
    EntryPointChunkIndex   uint32
    entryPointKind         entryPointKind
    IsLive                 bool
}

The EntryBits field is the core data structure for code splitting — we'll explore it in detail below.

flowchart LR
    subgraph "Scan Phase Output (immutable)"
        IF1["InputFile 0 (runtime)"]
        IF2["InputFile 1 (entry)"]
        IF3["InputFile 2 (utils)"]
    end

    subgraph "Linker Graph (mutable clone)"
        LF1["LinkerFile 0"]
        LF2["LinkerFile 1"]
        LF3["LinkerFile 2"]
    end

    IF1 -.->|shallow clone| LF1
    IF2 -.->|shallow clone| LF2
    IF3 -.->|shallow clone| LF3

Tip: The Go language doesn't have type system features for immutability, so this contract is enforced by convention. If you're contributing to esbuild, be extremely careful not to mutate any AST data from the scan phase — the comment explicitly warns about this.

The Link() function executes a strict sequence of steps. Each step depends on the results of the previous one:

flowchart TD
    A["Clone linker graph"] --> B["Look up runtime refs"]
    B --> C["scanImportsAndExports()"]
    C --> D{"Errors?"}
    D -->|Yes| ABORT["Return empty"]
    D -->|No| E["treeShakingAndCodeSplitting()"]
    E --> F["computeChunks()"]
    F --> G["computeCrossChunkDependencies()"]
    G --> H["mangleProps()"]
    H --> I["FollowAllSymbols()"]
    I --> J["generateChunksInParallel()"]
    J --> OUT["[]OutputFile"]

Let's examine the most important steps.

Runtime Ref Lookup

Before any linking begins, the linker resolves references to runtime helpers. From internal/linker/linker.go#L264-L271:

runtimeRepr := c.graph.Files[runtime.SourceIndex].InputFile.Repr.(*graph.JSRepr)
if c.options.ProfilerNames {
    c.cjsRuntimeRef = runtimeRepr.AST.NamedExports["__commonJS"].Ref
    c.esmRuntimeRef = runtimeRepr.AST.NamedExports["__esm"].Ref
} else {
    c.cjsRuntimeRef = runtimeRepr.AST.NamedExports["__commonJSMin"].Ref
    c.esmRuntimeRef = runtimeRepr.AST.NamedExports["__esmMin"].Ref
}

Notice the dual naming: __commonJS vs __commonJSMin. The "Min" variants generate smaller wrapper code by omitting the function's module and exports parameter names, which helps with minification. The profiler names option selects the larger variants that preserve these names for debugging.

Import/Export Binding Resolution

scanImportsAndExports() is the linker's most complex step. It resolves every cross-file import to the symbol it ultimately refers to, handling:

  • Direct imports: import { x } from './mod' → find x in mod's exports
  • Re-exports: export { x } from './other' → trace through the re-export chain
  • Star exports: export * from './utils' → collect all named exports
  • CommonJS interop: Wrapping CJS modules in __commonJS(), auto-inserting __toESM() for ESM consumers
  • Cycle handling: Detecting and handling circular import chains

The step operates in multiple sub-steps, each clearly labeled in the code with comments like "Step 1: Figure out what modules must be CommonJS." From line 1293:

// Step 1: Figure out what modules must be CommonJS
for _, sourceIndex := range c.graph.ReachableFiles {
    // ...
}

This cascading analysis determines the module format for each file, which in turn determines how imports and exports are connected. A file that starts as ESM might be "downgraded" to CommonJS if it's required by another module.

sequenceDiagram
    participant Entry as entry.js
    participant Utils as utils.js
    participant Lib as lib.js
    participant Linker as Linker

    Linker->>Entry: Scan exports
    Linker->>Utils: Scan exports
    Linker->>Lib: Scan exports

    Note over Linker: Resolve import bindings
    Linker->>Entry: import {foo} from utils
    Linker->>Utils: export {foo} = find 'foo'
    Linker->>Utils: export {bar} from lib (re-export)
    Linker->>Lib: find 'bar' in lib's exports
    Note over Linker: Merge symbols across files

Tree Shaking via Reachability Analysis

treeShakingAndCodeSplitting() performs two distinct passes:

func (c *linkerContext) treeShakingAndCodeSplitting() {
    // Tree shaking: Each entry point marks all files reachable from itself
    c.timer.Begin("Tree shaking")
    for _, entryPoint := range c.graph.EntryPoints() {
        c.markFileLiveForTreeShaking(entryPoint.SourceIndex)
    }
    c.timer.End("Tree shaking")

    // Code splitting: Determine which entry points can reach which files
    c.timer.Begin("Code splitting")
    for i, entryPoint := range c.graph.EntryPoints() {
        c.markFileReachableForCodeSplitting(entryPoint.SourceIndex, uint(i), 0)
    }
    c.timer.End("Code splitting")
}

Tree shaking works by starting from entry points and walking the import graph, marking files and their parts as "live." A file is dead (removable) if no live entry point imports it and it has no side effects. The sideEffects field from package.json plays a crucial role here — setting "sideEffects": false tells esbuild that importing a file without using its exports has no observable effect.

Code splitting then computes which entry points can reach each file. This must happen after tree shaking because there are implicit dependencies between live parts within the same file. The comment in the code is explicit about this ordering constraint.

flowchart TD
    E1["Entry A"] --> M1["Module 1"]
    E1 --> M2["Module 2"]
    E2["Entry B"] --> M2
    E2 --> M3["Module 3"]

    M1 -.-> CHUNK_A["Chunk A (only Entry A)"]
    M3 -.-> CHUNK_B["Chunk B (only Entry B)"]
    M2 -.-> CHUNK_SHARED["Shared Chunk (Entry A + B)"]

    style M1 fill:#d4edda
    style M2 fill:#fff3cd
    style M3 fill:#d4edda

Code Splitting with BitSet Entry-Point Membership

The code splitting algorithm is elegant in its simplicity. computeChunks() uses a BitSet (a compact bit array) on each LinkerFile to track which entry points can reach it:

func (c *linkerContext) computeChunks() {
    jsChunks := make(map[string]chunkInfo)
    cssChunks := make(map[string]chunkInfo)

    for i, entryPoint := range c.graph.EntryPoints() {
        entryBits := helpers.NewBitSet(uint(len(c.graph.EntryPoints())))
        entryBits.SetBit(uint(i))
        key := entryBits.String()
        chunk := chunkInfo{
            entryBits:             entryBits,
            isEntryPoint:          true,
            sourceIndex:           entryPoint.SourceIndex,
            // ...
        }
        // ...
    }
}

The algorithm:

  1. Each entry point gets a BitSet with exactly one bit set
  2. markFileReachableForCodeSplitting() propagates entry-point bits to all reachable files
  3. Files with identical BitSet values belong to the same chunk
  4. The BitSet string representation serves as a hash key for grouping

For example, if file X is reachable from Entry A (bit 0) and Entry B (bit 1), its EntryBits is {0, 1}. All files with EntryBits = {0, 1} go into a shared chunk.

Cross-Chunk Dependencies

After computing chunks, computeCrossChunkDependencies() determines which symbols need to be exported from one chunk and imported by another. This generates the import/export statements that connect chunks at runtime.

A subtle challenge: chunk paths contain content hashes, but content hashes depend on cross-chunk imports, which depend on chunk paths — a chicken-and-egg problem. esbuild solves this with a uniqueKeyPrefix mechanism: during code generation, cross-chunk references use a temporary placeholder string. After all hashes are computed, a final substitution pass replaces placeholders with real paths.

Parallel Chunk Generation

generateChunksInParallel() is where the output files are actually produced:

func (c *linkerContext) generateChunksInParallel(additionalFiles []graph.OutputFile) []graph.OutputFile {
    generateWaitGroup := sync.WaitGroup{}
    generateWaitGroup.Add(len(c.chunks))
    for chunkIndex := range c.chunks {
        switch c.chunks[chunkIndex].chunkRepr.(type) {
        case *chunkReprJS:
            go c.generateChunkJS(chunkIndex, &generateWaitGroup)
        case *chunkReprCSS:
            go c.generateChunkCSS(chunkIndex, &generateWaitGroup)
        }
    }
    generateWaitGroup.Wait()
    // ...
}

Each chunk is generated independently in its own goroutine. JS and CSS chunks use different code generation paths (generateChunkJS vs generateChunkCSS) but follow the same pattern:

  1. Compute the order of files in the chunk
  2. For each file, print its AST to text using the JS or CSS printer
  3. Generate source maps alongside the output text
  4. Compute a content hash for the chunk

After all chunks complete, a final sequential step:

  1. Compute final hashes (incorporating dependency chain hashes)
  2. Resolve chunk paths from templates
  3. Substitute uniqueKey placeholders with final paths
sequenceDiagram
    participant L as Linker
    participant C1 as Chunk 1 (goroutine)
    participant C2 as Chunk 2 (goroutine)
    participant C3 as Chunk 3 (goroutine)

    L->>C1: generateChunkJS()
    L->>C2: generateChunkJS()
    L->>C3: generateChunkCSS()

    C1-->>L: JS output + source map + hash
    C2-->>L: JS output + source map + hash
    C3-->>L: CSS output + source map + hash

    Note over L: Compute final hashes
    Note over L: Resolve chunk paths
    Note over L: Substitute unique keys
    L->>L: Assemble OutputFiles

The Complete Linking Pipeline

Putting it all together, the linker transforms an immutable module graph into output files through a carefully ordered pipeline. Each step's output feeds the next, and the ordering is not arbitrary — it reflects real data dependencies.

The key insight is the clean separation between the immutable scan data and the mutable linker state. This separation is what makes watch mode and incremental builds possible: when a file changes, only the scan phase needs to re-process that file, and the linker operates on a fresh clone every time.

In Part 4, we'll zoom in on how esbuild represents and manipulates identifiers across this entire pipeline — the Ref type, the SymbolMap, cross-file symbol merging, the renamer for minification, and the JS printer that turns AST nodes back into text with correct operator precedence.