The Linker: Tree Shaking, Code Splitting, and Chunk Generation
Prerequisites
- ›Article 1: Architecture and Project Layout
- ›Article 2: Scanning, Parsing, and Module Graph
- ›Understanding of ES module import/export semantics
- ›Familiarity with tree shaking concepts
The Linker: Tree Shaking, Code Splitting, and Chunk Generation
The linker is where the module graph becomes output files. It takes the immutable data produced by the scan phase and transforms it through a carefully ordered sequence of steps: cloning, import resolution, tree shaking, code splitting, and parallel code generation. This is the most complex part of esbuild, and the part where the most subtle correctness issues live.
In this article, we'll walk through each step of the Link() function and understand why the order matters.
Immutable Graph Cloning
The linker's design starts with an immutability contract, documented at the top of internal/graph/graph.go#L1-L14:
// This graph represents the set of files that the linker operates on. Each
// linker has a separate one of these graphs (there is one linker when code
// splitting is on, but one linker per entry point when code splitting is off).
//
// The input data to the linker constructor must be considered immutable because
// it's shared between linker invocations and is also stored in the cache for
// incremental builds.
//
// The linker constructor makes a shallow clone of the input data and is careful
// to pre-clone ahead of time the AST fields that it may modify.
This design enables two things:
- Watch mode: Unchanged files don't need to be re-parsed. The cached scan data is reused across rebuilds.
- Parallel linking: When code splitting is off, each entry point can be linked independently using its own clone.
CloneLinkerGraph() creates the linker's working copy, wrapping each InputFile in a LinkerFile that adds mutable linking metadata:
type LinkerFile struct {
EntryBits helpers.BitSet // Which entry points reach this file
InputFile InputFile
DistanceFromEntryPoint uint32
EntryPointChunkIndex uint32
entryPointKind entryPointKind
IsLive bool
}
The EntryBits field is the core data structure for code splitting — we'll explore it in detail below.
flowchart LR
subgraph "Scan Phase Output (immutable)"
IF1["InputFile 0 (runtime)"]
IF2["InputFile 1 (entry)"]
IF3["InputFile 2 (utils)"]
end
subgraph "Linker Graph (mutable clone)"
LF1["LinkerFile 0"]
LF2["LinkerFile 1"]
LF3["LinkerFile 2"]
end
IF1 -.->|shallow clone| LF1
IF2 -.->|shallow clone| LF2
IF3 -.->|shallow clone| LF3
Tip: The Go language doesn't have type system features for immutability, so this contract is enforced by convention. If you're contributing to esbuild, be extremely careful not to mutate any AST data from the scan phase — the comment explicitly warns about this.
The Link() Pipeline Steps
The Link() function executes a strict sequence of steps. Each step depends on the results of the previous one:
flowchart TD
A["Clone linker graph"] --> B["Look up runtime refs"]
B --> C["scanImportsAndExports()"]
C --> D{"Errors?"}
D -->|Yes| ABORT["Return empty"]
D -->|No| E["treeShakingAndCodeSplitting()"]
E --> F["computeChunks()"]
F --> G["computeCrossChunkDependencies()"]
G --> H["mangleProps()"]
H --> I["FollowAllSymbols()"]
I --> J["generateChunksInParallel()"]
J --> OUT["[]OutputFile"]
Let's examine the most important steps.
Runtime Ref Lookup
Before any linking begins, the linker resolves references to runtime helpers. From internal/linker/linker.go#L264-L271:
runtimeRepr := c.graph.Files[runtime.SourceIndex].InputFile.Repr.(*graph.JSRepr)
if c.options.ProfilerNames {
c.cjsRuntimeRef = runtimeRepr.AST.NamedExports["__commonJS"].Ref
c.esmRuntimeRef = runtimeRepr.AST.NamedExports["__esm"].Ref
} else {
c.cjsRuntimeRef = runtimeRepr.AST.NamedExports["__commonJSMin"].Ref
c.esmRuntimeRef = runtimeRepr.AST.NamedExports["__esmMin"].Ref
}
Notice the dual naming: __commonJS vs __commonJSMin. The "Min" variants generate smaller wrapper code by omitting the function's module and exports parameter names, which helps with minification. The profiler names option selects the larger variants that preserve these names for debugging.
Import/Export Binding Resolution
scanImportsAndExports() is the linker's most complex step. It resolves every cross-file import to the symbol it ultimately refers to, handling:
- Direct imports:
import { x } from './mod'→ findxin mod's exports - Re-exports:
export { x } from './other'→ trace through the re-export chain - Star exports:
export * from './utils'→ collect all named exports - CommonJS interop: Wrapping CJS modules in
__commonJS(), auto-inserting__toESM()for ESM consumers - Cycle handling: Detecting and handling circular import chains
The step operates in multiple sub-steps, each clearly labeled in the code with comments like "Step 1: Figure out what modules must be CommonJS." From line 1293:
// Step 1: Figure out what modules must be CommonJS
for _, sourceIndex := range c.graph.ReachableFiles {
// ...
}
This cascading analysis determines the module format for each file, which in turn determines how imports and exports are connected. A file that starts as ESM might be "downgraded" to CommonJS if it's required by another module.
sequenceDiagram
participant Entry as entry.js
participant Utils as utils.js
participant Lib as lib.js
participant Linker as Linker
Linker->>Entry: Scan exports
Linker->>Utils: Scan exports
Linker->>Lib: Scan exports
Note over Linker: Resolve import bindings
Linker->>Entry: import {foo} from utils
Linker->>Utils: export {foo} = find 'foo'
Linker->>Utils: export {bar} from lib (re-export)
Linker->>Lib: find 'bar' in lib's exports
Note over Linker: Merge symbols across files
Tree Shaking via Reachability Analysis
treeShakingAndCodeSplitting() performs two distinct passes:
func (c *linkerContext) treeShakingAndCodeSplitting() {
// Tree shaking: Each entry point marks all files reachable from itself
c.timer.Begin("Tree shaking")
for _, entryPoint := range c.graph.EntryPoints() {
c.markFileLiveForTreeShaking(entryPoint.SourceIndex)
}
c.timer.End("Tree shaking")
// Code splitting: Determine which entry points can reach which files
c.timer.Begin("Code splitting")
for i, entryPoint := range c.graph.EntryPoints() {
c.markFileReachableForCodeSplitting(entryPoint.SourceIndex, uint(i), 0)
}
c.timer.End("Code splitting")
}
Tree shaking works by starting from entry points and walking the import graph, marking files and their parts as "live." A file is dead (removable) if no live entry point imports it and it has no side effects. The sideEffects field from package.json plays a crucial role here — setting "sideEffects": false tells esbuild that importing a file without using its exports has no observable effect.
Code splitting then computes which entry points can reach each file. This must happen after tree shaking because there are implicit dependencies between live parts within the same file. The comment in the code is explicit about this ordering constraint.
flowchart TD
E1["Entry A"] --> M1["Module 1"]
E1 --> M2["Module 2"]
E2["Entry B"] --> M2
E2 --> M3["Module 3"]
M1 -.-> CHUNK_A["Chunk A (only Entry A)"]
M3 -.-> CHUNK_B["Chunk B (only Entry B)"]
M2 -.-> CHUNK_SHARED["Shared Chunk (Entry A + B)"]
style M1 fill:#d4edda
style M2 fill:#fff3cd
style M3 fill:#d4edda
Code Splitting with BitSet Entry-Point Membership
The code splitting algorithm is elegant in its simplicity. computeChunks() uses a BitSet (a compact bit array) on each LinkerFile to track which entry points can reach it:
func (c *linkerContext) computeChunks() {
jsChunks := make(map[string]chunkInfo)
cssChunks := make(map[string]chunkInfo)
for i, entryPoint := range c.graph.EntryPoints() {
entryBits := helpers.NewBitSet(uint(len(c.graph.EntryPoints())))
entryBits.SetBit(uint(i))
key := entryBits.String()
chunk := chunkInfo{
entryBits: entryBits,
isEntryPoint: true,
sourceIndex: entryPoint.SourceIndex,
// ...
}
// ...
}
}
The algorithm:
- Each entry point gets a
BitSetwith exactly one bit set markFileReachableForCodeSplitting()propagates entry-point bits to all reachable files- Files with identical
BitSetvalues belong to the same chunk - The
BitSetstring representation serves as a hash key for grouping
For example, if file X is reachable from Entry A (bit 0) and Entry B (bit 1), its EntryBits is {0, 1}. All files with EntryBits = {0, 1} go into a shared chunk.
Cross-Chunk Dependencies
After computing chunks, computeCrossChunkDependencies() determines which symbols need to be exported from one chunk and imported by another. This generates the import/export statements that connect chunks at runtime.
A subtle challenge: chunk paths contain content hashes, but content hashes depend on cross-chunk imports, which depend on chunk paths — a chicken-and-egg problem. esbuild solves this with a uniqueKeyPrefix mechanism: during code generation, cross-chunk references use a temporary placeholder string. After all hashes are computed, a final substitution pass replaces placeholders with real paths.
Parallel Chunk Generation
generateChunksInParallel() is where the output files are actually produced:
func (c *linkerContext) generateChunksInParallel(additionalFiles []graph.OutputFile) []graph.OutputFile {
generateWaitGroup := sync.WaitGroup{}
generateWaitGroup.Add(len(c.chunks))
for chunkIndex := range c.chunks {
switch c.chunks[chunkIndex].chunkRepr.(type) {
case *chunkReprJS:
go c.generateChunkJS(chunkIndex, &generateWaitGroup)
case *chunkReprCSS:
go c.generateChunkCSS(chunkIndex, &generateWaitGroup)
}
}
generateWaitGroup.Wait()
// ...
}
Each chunk is generated independently in its own goroutine. JS and CSS chunks use different code generation paths (generateChunkJS vs generateChunkCSS) but follow the same pattern:
- Compute the order of files in the chunk
- For each file, print its AST to text using the JS or CSS printer
- Generate source maps alongside the output text
- Compute a content hash for the chunk
After all chunks complete, a final sequential step:
- Compute final hashes (incorporating dependency chain hashes)
- Resolve chunk paths from templates
- Substitute
uniqueKeyplaceholders with final paths
sequenceDiagram
participant L as Linker
participant C1 as Chunk 1 (goroutine)
participant C2 as Chunk 2 (goroutine)
participant C3 as Chunk 3 (goroutine)
L->>C1: generateChunkJS()
L->>C2: generateChunkJS()
L->>C3: generateChunkCSS()
C1-->>L: JS output + source map + hash
C2-->>L: JS output + source map + hash
C3-->>L: CSS output + source map + hash
Note over L: Compute final hashes
Note over L: Resolve chunk paths
Note over L: Substitute unique keys
L->>L: Assemble OutputFiles
The Complete Linking Pipeline
Putting it all together, the linker transforms an immutable module graph into output files through a carefully ordered pipeline. Each step's output feeds the next, and the ordering is not arbitrary — it reflects real data dependencies.
The key insight is the clean separation between the immutable scan data and the mutable linker state. This separation is what makes watch mode and incremental builds possible: when a file changes, only the scan phase needs to re-process that file, and the linker operates on a fresh clone every time.
In Part 4, we'll zoom in on how esbuild represents and manipulates identifiers across this entire pipeline — the Ref type, the SymbolMap, cross-file symbol merging, the renamer for minification, and the JS printer that turns AST nodes back into text with correct operator precedence.