Read OSS

The Binder: Building Symbol Tables and Control Flow Graphs

Advanced

Prerequisites

  • Article 1: Architecture & Codebase Map
  • Article 2: Scanner, Parser, and AST
  • Understanding of symbol tables and scope chains in compilers
  • Familiarity with control flow analysis concepts

The Binder: Building Symbol Tables and Control Flow Graphs

After the parser delivers a SourceFile AST, the next phase gives names their meaning. The binder walks the AST depth-first, creating Symbol objects for every declaration, organizing them into hierarchical symbol tables, setting parent pointers on every node, and — crucially — constructing the control flow graph that enables TypeScript's type narrowing. All of this happens in src/compiler/binder.ts — roughly 3,900 lines.

The binder is the bridge between syntax and semantics. Understanding it is essential before tackling the checker, because the checker assumes everything the binder produces is already in place: parent pointers, symbol tables, and flow nodes.

bindSourceFile and the Closure-Based Binder

Like the scanner and parser, the binder uses the closure-based module pattern. createBinder() returns a function that accepts a SourceFile and CompilerOptions. Internally, it closes over dozens of var locals:

function createBinder(): (file: SourceFile, options: CompilerOptions) => void {
    // Why var? It avoids TDZ checks in the runtime which can be costly.
    // See: https://github.com/microsoft/TypeScript/issues/52924
    var file: SourceFile;
    var options: CompilerOptions;
    var parent: Node;
    var container: IsContainer | EntityNameExpression;
    var blockScopeContainer: IsBlockScopedContainer;
    // ... control flow state ...
    var currentFlow: FlowNode;
    var currentBreakTarget: FlowLabel | undefined;
    var currentContinueTarget: FlowLabel | undefined;
    // ...

The binder is instantiated once as a module-level singleton — const binder = createBinder() — and reused for every file. Each call to bindSourceFile() reinitializes the closure state and walks the new file's AST.

flowchart TD
    BSF["bindSourceFile(file, options)"] --> Init["Initialize closure state<br/>Reset containers, flow nodes"]
    Init --> Walk["Depth-first AST walk"]
    Walk --> Bind["bind(node)"]
    Bind --> SetParent["Set node.parent"]
    Bind --> CheckDecl{"Is declaration?"}
    CheckDecl -->|Yes| CreateSym["Create/merge Symbol"]
    CheckDecl -->|No| Skip["Continue walk"]
    CreateSym --> AddToTable["Add to container's symbol table"]
    Bind --> FlowCheck{"Affects control flow?"}
    FlowCheck -->|Yes| UpdateFlow["Create FlowNode, update currentFlow"]
    FlowCheck -->|No| Continue["Continue"]
    Bind --> Children["Visit child nodes"]

The bindSourceFile() wrapper is minimal — it wraps the binder call with performance marks for profiling.

Tip: The binder's closure state is the reason you can't run binding on multiple files concurrently within a single program. The closure locals act as implicit thread-local state — fast for single-threaded execution, but not parallelizable.

Symbols, SymbolFlags, and Symbol Tables

The Symbol Interface

A Symbol represents the semantic identity of a named entity. Its key fields:

  • flags: SymbolFlags — classifies what kind of entity this symbol represents
  • escapedName: __String — the symbol's name (escaped to avoid collisions with built-in properties)
  • declarations: Declaration[] — all AST nodes that declare this symbol (may be multiple for merged declarations)
  • valueDeclaration: Declaration — the first value-producing declaration
  • members: SymbolTable — instance members (for classes, interfaces, object literals)
  • exports: SymbolTable — exported members (for modules and namespaces)

SymbolFlags: Classification

SymbolFlags is a bitflag enum that classifies symbols into categories:

flowchart TD
    SF["SymbolFlags"]
    SF --> Value["Value Space<br/>FunctionScopedVariable | BlockScopedVariable<br/>Property | EnumMember | Function<br/>Class | Method | Accessor"]
    SF --> TypeSpace["Type Space<br/>Class | Interface | Enum<br/>EnumMember | TypeLiteral<br/>TypeParameter | TypeAlias"]
    SF --> Namespace["Namespace Space<br/>ValueModule | NamespaceModule | Enum"]
    SF --> Special["Special Flags<br/>Alias | Transient | Assignment<br/>Optional | ExportStar | Prototype"]

A single symbol can simultaneously exist in the value, type, and namespace "spaces" — this is how class Foo creates both a value (the constructor) and a type (the instance type). The composite flags like Value, Type, and Namespace are unions of the individual flags:

Value = Variable | Property | EnumMember | ObjectLiteral |
        Function | Class | Enum | ValueModule | Method |
        GetAccessor | SetAccessor,
Type  = Class | Interface | Enum | EnumMember |
        TypeLiteral | TypeParameter | TypeAlias,

Symbol Tables and Container Hierarchy

Symbols are stored in SymbolTable maps (essentially Map<__String, Symbol>) attached to container nodes. The binder maintains a stack of containers:

  • Global container — symbols available everywhere
  • Module/SourceFile — top-level declarations, populating the exports symbol table
  • Class/Interface — populating members for instance members
  • Function/Block — populating locals for local declarations

When the binder encounters a declaration, it determines the appropriate container and adds the symbol to that container's symbol table. For var declarations, the container is the enclosing function (function-scoped). For let/const, it's the enclosing block (block-scoped).

Declaration Merging

When two declarations share the same name in the same container, the binder merges them into a single symbol. This is how TypeScript supports:

  • Interface merging: interface Foo { x: number } + interface Foo { y: string }
  • Namespace merging with classes/functions/enums
  • Module augmentation

The merge logic checks SymbolFlags compatibility — for instance, a Class and an Interface can merge (class-interface augmentation), but two Class declarations cannot. The declarations array on the merged symbol accumulates all contributing declaration nodes.

The binder creates Symbols, but it doesn't resolve types — that's the checker's job. The bridge between them is SymbolLinks, a separate interface that extends Symbol at runtime with lazily-computed type information.

classDiagram
    class Symbol {
        +flags: SymbolFlags
        +escapedName: __String
        +declarations: Declaration[]
        +members: SymbolTable
        +exports: SymbolTable
    }
    class SymbolLinks {
        +type: Type
        +writeType: Type
        +declaredType: Type
        +typeParameters: TypeParameter[]
        +target: Symbol
        +aliasTarget: Symbol
        +mapper: TypeMapper
        +resolvedExports: SymbolTable
        +resolvedMembers: SymbolTable
    }
    Symbol ..> SymbolLinks : "checker.getSymbolLinks(symbol)"

Key SymbolLinks fields include:

  • type — the resolved type of the symbol's value
  • declaredType — for classes, interfaces, and type aliases, the declared type
  • typeParameters — generic type parameters (for type aliases and classes)
  • aliasTarget — for import aliases, the resolved target symbol
  • resolvedExports / resolvedMembers — merged symbol tables combining early-bound and late-bound members

The checker accesses SymbolLinks through a getSymbolLinks(symbol) function that creates the links object on first access and caches it. This lazy initialization pattern is critical for performance — the checker only computes type information for symbols that are actually queried, rather than eagerly computing types for every declaration in the program.

Tip: If you're debugging type resolution in the checker and wondering where a symbol's type comes from, look for getTypeOfSymbol()getSymbolLinks(symbol).type. The first call triggers resolution; subsequent calls hit the cache.

Control Flow Graph Construction

The binder's second major responsibility — beyond symbol tables — is constructing the control flow graph. This graph enables TypeScript's type narrowing: when you write if (x !== null) { x.method() }, the checker knows x is non-null inside the if block because the flow graph models the branch.

FlowNode Types

The control flow graph is represented as a linked list of FlowNode objects, classified by FlowFlags:

FlowNode Type FlowFlags Purpose
FlowStart Start Beginning of a function's flow graph
FlowAssignment Assignment Variable assignment (narrows the type)
FlowCondition TrueCondition / FalseCondition Branch from a conditional (if/ternary)
FlowSwitchClause SwitchClause Branch from a switch case
FlowLabel BranchLabel / LoopLabel Junction merging multiple flows
FlowCall Call Potential assertion call (e.g., assert(x))
FlowArrayMutation ArrayMutation Array .push() that might narrow tuple type

Each FlowNode has an antecedent pointer linking it to the preceding flow node(s). The graph is navigated backwards — from a variable use point, the checker walks antecedent links until it reaches an assignment or branch that determines the narrowed type.

How the Binder Builds Flow Graphs

flowchart TB
    Start["FlowStart"] --> A1["x = getSomething()<br/>FlowAssignment"]
    A1 --> Cond{"if (x !== null)"}
    Cond -->|"True"| TrueFlow["FlowCondition<br/>(TrueCondition)"]
    Cond -->|"False"| FalseFlow["FlowCondition<br/>(FalseCondition)"]
    TrueFlow --> Body["x.method()<br/>(use of x)"]
    Body --> Merge["FlowLabel<br/>(BranchLabel)"]
    FalseFlow --> Merge
    Merge --> After["code after if"]

As the binder walks the AST, it maintains currentFlow — the current position in the flow graph. When it encounters:

  • An assignment (x = expr): Creates a FlowAssignment node with currentFlow as antecedent, then updates currentFlow to the new node.
  • An if statement: Saves currentFlow, creates FlowCondition nodes for the true and false branches, processes each branch, then creates a FlowLabel junction that merges both branch endpoints.
  • A loop: Creates a FlowLabel with LoopLabel flag, which gets back-edges as the loop body is processed.
  • A return/throw/break: Sets currentFlow to unreachableFlow, marking subsequent code as dead.
  • A function call: If the function might be an assertion (type predicate returning asserts), creates a FlowCall node.

The binder tracks several flow targets simultaneously — currentBreakTarget, currentContinueTarget, currentReturnTarget, currentTrueTarget, currentFalseTarget, and currentExceptionTarget — to correctly wire up flow edges for complex control flow involving loops, labeled statements, and try/catch blocks.

Connection to Narrowing

The checker (which we'll cover in Part 4) consumes this graph in getFlowTypeOfReference(). Starting from a variable reference, it walks backwards through the flow graph, applying narrowing at each FlowCondition and FlowAssignment. The id field on each FlowNode is used as a cache key — once the narrowed type at a particular flow node is computed, it's cached to avoid exponential re-computation in graphs with many branches.

What's Next

The binder has given every declaration a Symbol, organized symbols into scoped tables, and threaded a control flow graph through the AST. In Part 4, we'll enter the compiler's largest and most complex module — the 54,000-line type checker — to see how it resolves types from symbols, checks assignability, infers generics, narrows types along control flow edges, and reports the errors that make TypeScript useful.