Read OSS

The Type Checker: 54,000 Lines of Type Theory in Practice

Advanced

Prerequisites

  • Articles 1-3: Architecture, AST, Binder
  • Understanding of type theory fundamentals (subtyping, variance, type inference)
  • Familiarity with TypeScript's advanced type features (generics, conditional types, mapped types, template literal types)

The Type Checker: 54,000 Lines of Type Theory in Practice

The type checker is where TypeScript earns its name. At approximately 54,400 lines, src/compiler/checker.ts is one of the largest single files in any open-source project. It implements a structural type system with generics, conditional types, mapped types, template literal types, control-flow narrowing, and dozens of special-case rules — all driven by a demand-based architecture that computes types lazily and caches aggressively.

This article maps the terrain: how the checker initializes, the Type hierarchy it works with, the core algorithms for assignability and inference, how narrowing consumes the binder's flow graph, and how the Program orchestrator ties everything together.

createTypeChecker and the Demand-Driven Architecture

createTypeChecker(host) is the factory function. Like the scanner, parser, and binder, it's a closure over hundreds of var locals:

export function createTypeChecker(host: TypeCheckerHost): TypeChecker {
    // Why var? It avoids TDZ checks in the runtime which can be costly.
    var cancellationToken: CancellationToken | undefined;
    var scanner: Scanner | undefined;
    var Symbol = objectAllocator.getSymbolConstructor();
    var Type = objectAllocator.getTypeConstructor();
    var Signature = objectAllocator.getSignatureConstructor();
    var typeCount = 0;
    var symbolCount = 0;
    var totalInstantiationCount = 0;
    // ... hundreds more ...

The checker reads compiler options from its host and pre-computes commonly-queried flags (strictNullChecks, strictFunctionTypes, noImplicitAny, etc.) into local variables for fast access. It then creates internal infrastructure — the name resolver, the binary expression checker, the node builder — and initializes the global symbol table.

sequenceDiagram
    participant P as Program
    participant TC as TypeChecker
    participant S as Symbol
    participant SL as SymbolLinks

    P->>TC: createTypeChecker(host)
    Note over TC: Initialize 500+ var locals
    P->>TC: getSemanticDiagnostics(sourceFile)
    TC->>TC: checkSourceFile(sourceFile)
    TC->>TC: checkExpression(node)
    TC->>S: getSymbolOfNode(node)
    TC->>SL: getSymbolLinks(symbol)
    alt type not cached
        TC->>TC: getTypeOfSymbol(symbol) → resolve type
        TC->>SL: links.type = resolvedType
    end
    TC-->>P: Diagnostic[]

The demand-driven aspect is crucial. The checker doesn't eagerly compute types for every symbol in the program. Instead, types are resolved only when needed — when checking an expression, when a diagnostic requires a type, or when the language service asks for a completion's type. Each resolution is cached on SymbolLinks, so subsequent queries for the same symbol's type return instantly.

This laziness has a practical implication: changing a single line in a file doesn't require re-checking the entire program. The Program can discard its cached checker and create a new one when files change, and the new checker only pays for the types it actually needs.

Tip: When debugging the checker, the entry point for most type resolution is getTypeOfSymbol(). Set a breakpoint there to watch how types are computed on demand. For expression type checking, start with checkExpression().

The Type Hierarchy

The Type interface is the base, classified by TypeFlags:

classDiagram
    class Type {
        +flags: TypeFlags
        +id: TypeId
        +symbol: Symbol
    }
    class IntrinsicType {
        +intrinsicName: string
        +objectFlags: ObjectFlags
    }
    class LiteralType {
        +value: string | number | PseudoBigInt
    }
    class ObjectType {
        +objectFlags: ObjectFlags
    }
    class UnionType {
        +types: Type[]
    }
    class IntersectionType {
        +types: Type[]
    }
    class TypeParameter {
        +constraint: Type
        +default: Type
    }
    class ConditionalType {
        +checkType: Type
        +extendsType: Type
        +trueType: Type
        +falseType: Type
    }
    Type <|-- IntrinsicType
    Type <|-- LiteralType
    Type <|-- ObjectType
    Type <|-- UnionType
    Type <|-- IntersectionType
    Type <|-- TypeParameter
    Type <|-- ConditionalType
    LiteralType <|-- StringLiteralType
    LiteralType <|-- NumberLiteralType
    LiteralType <|-- BigIntLiteralType

The TypeFlags enum is carefully ordered — the comment in the source explains that ordering determines the position of constituents in union types, and since union processing often bails out early, types are sorted by increasing complexity. Simple primitives come first, recursive types like IndexedAccess and Conditional come last.

Key type categories:

  • Intrinsic types: string, number, boolean, void, undefined, null, never, any, unknown — singletons with no structural content
  • Literal types: "hello", 42, true — types with a specific value
  • Object types: Classes, interfaces, anonymous object types, mapped types, tuples — classified further by ObjectFlags (Class, Interface, Reference, Tuple, Anonymous, Mapped, etc.)
  • Union/Intersection types: T | U and T & U — containing an array of constituent types
  • Type parameters: Generic T with optional constraint and default
  • Conditional types: T extends U ? X : Y — with deferred resolution for unresolved type parameters
  • Template literal types: `hello${T}` — types combining literal string parts with type placeholders
  • Indexed access types: T[K] — resolved lazily based on T and K
  • Substitution types: Internal bookkeeping for instantiated type parameters

Type Inference and Assignability

Assignability: Structural Compatibility

TypeScript uses structural typing — a type is compatible if it has at least the same properties and signatures. The core of this is isTypeAssignableTo(source, target), which flows into isTypeRelatedTo() with a specific relation (assignability, comparability, or subtype).

flowchart TD
    Start["isTypeAssignableTo(source, target)"] --> Identity{"Same type?"}
    Identity -->|Yes| OK["✓ Compatible"]
    Identity -->|No| Special{"Special cases?<br/>any, never, unions"}
    Special -->|"source = any"| OK
    Special -->|"source = never"| OK
    Special -->|"target = any"| OK
    Special -->|"source = union"| EachSource["Each constituent<br/>must be assignable"]
    Special -->|"target = union"| SomeTarget["At least one target<br/>constituent matches"]
    Special -->|No special case| Structural["Structural comparison:<br/>Check each target property<br/>Check call signatures<br/>Check index signatures"]
    Structural --> ExcessCheck{"Excess property<br/>check?"}
    ExcessCheck -->|"Fresh object literal"| Excess["Check no extra<br/>properties"]
    ExcessCheck -->|No| Done["Return result"]

The algorithm handles a remarkable number of edge cases:

  • Unions and intersections: A source union is assignable if every constituent is assignable; a target union is satisfied if any constituent matches.
  • Generic types: Instantiated generics compare their type arguments with appropriate variance (covariant for read positions, contravariant for write positions).
  • Excess property checks: Fresh object literals get stricter checking — properties not in the target type trigger errors.
  • Discriminated unions: For union targets, the checker can use discriminant properties to narrow which constituent to check against.

Type Inference for Generics

When you call a generic function without explicit type arguments, the checker infers them. The algorithm in the checker creates inference contexts with one inference candidate per type parameter, then walks the argument types against the parameter types, collecting candidate types at each position where a type parameter appears.

For example, in function map<T, U>(arr: T[], fn: (x: T) => U): U[] called with map([1, 2], x => x.toString()), the checker infers T = number from the array argument and U = string from the callback's return type. The inference proceeds in priority order — direct mappings from arguments beat indirect inferences from return types.

Control Flow Narrowing in the Checker

As we saw in Part 3, the binder constructs a FlowNode graph modeling the program's control flow. The checker consumes this graph in getFlowTypeOfReference() — the function that computes the narrowed type of a variable at a specific point in code.

flowchart BT
    Use["Reference to x"] --> FA["FlowAssignment<br/>x = getValue()"]
    FA --> FC["FlowCondition<br/>(TrueCondition)<br/>x !== null"]
    FC --> FL["FlowLabel<br/>(merge point)"]
    FL --> Start["FlowStart"]
    
    style Use fill:#f9f,stroke:#333
    style FC fill:#bbf,stroke:#333

The algorithm walks backward from the reference's flow node:

  1. FlowAssignment: The assigned type becomes the new type of x
  2. FlowCondition (TrueCondition): Apply the type guard. x !== null narrows away null. typeof x === "string" narrows to string. x instanceof Foo narrows to Foo.
  3. FlowCondition (FalseCondition): Apply the inverse guard.
  4. FlowLabel (BranchLabel): Compute the narrowed type along each antecedent and take the union of all results.
  5. FlowLabel (LoopLabel): Similar, but with iteration to a fixed point to handle loops.
  6. FlowCall: For assertion functions (asserts x is T), narrow accordingly.

The critical performance optimization: each FlowNode has an id used as a cache key. The checker maintains a flow type cache keyed by (flowNodeId, referenceSymbolId). Without this cache, narrowing in a function with many branches could easily become exponential.

Tip: If you're seeing surprising narrowing behavior, the flow graph is the first place to look. Use node.flowNode (on FlowContainer nodes) and walk the antecedent chain to understand what the checker sees.

The Program Orchestrator

The Program is the top-level object that ties everything together. createProgram() takes root file names and compiler options, then:

  1. Resolves files: Starting from root files, it discovers transitive imports via module resolution, building the complete set of SourceFile objects.
  2. Manages the checker: The checker is created lazily (on first call to getSemanticDiagnostics or getTypeChecker) and cached.
  3. Provides emit: program.emit() invokes the emitter pipeline.
  4. Reports diagnostics: Options diagnostics, syntactic diagnostics (from the parser), semantic diagnostics (from the checker), and declaration diagnostics (from declaration emit).

The Program interface exposes:

export interface Program extends ScriptReferenceHost {
    getRootFileNames(): readonly string[];
    getSourceFiles(): readonly SourceFile[];
    emit(...): EmitResult;
    getOptionsDiagnostics(): readonly Diagnostic[];
    getSyntacticDiagnostics(sourceFile?): readonly DiagnosticWithLocation[];
    getSemanticDiagnostics(sourceFile?): readonly Diagnostic[];
    getTypeChecker(): TypeChecker;
    // ...
}

The CompilerOptions interface at src/compiler/types.ts#L7403 defines every compiler flag. Many of these directly affect checker behavior — strictNullChecks controls whether null and undefined are distinct types, strictFunctionTypes controls function parameter variance, exactOptionalPropertyTypes controls whether optional properties include undefined, and so on.

The Program also handles the crucial task of module resolution — given an import specifier, it resolves to a file path using the configured resolution strategy (Node.js, classic, bundler). Resolved modules are cached per-file in the resolvedModules map, and the module resolution cache can be reused across Program instances for incremental performance.

The expressionToTypeNode Bridge

One interesting piece of infrastructure sits at the boundary between the checker and the emitter: src/compiler/expressionToTypeNode.ts. This module converts internal Type objects back into AST TypeNode syntax — the reverse of what the checker normally does. It's essential for declaration file generation (Part 5), where inferred types that have no explicit annotation in source code must be materialized as type syntax in the .d.ts output.

What's Next

With the checker producing types, diagnostics, and resolved type information, the compiler is ready to produce output. In Part 5, we'll follow the emitter pipeline — how a chain of AST-to-AST transformers progressively lowers TypeScript syntax to JavaScript, how declaration files are generated, and how source maps maintain the connection between input and output.