Read OSS

Architecture of the Zig Compiler: A Map for the Codebase

Intermediate

Prerequisites

  • Basic Zig language knowledge (comptime, @import, error unions, packed structs)
  • General familiarity with compiler concepts (lexing, parsing, IRs, code generation)

Architecture of the Zig Compiler: A Map for the Codebase

The Zig compiler is a self-hosted, multi-stage compiler that lives in a single monorepo. At the time of writing, the src/ directory alone contains over 300K lines of Zig, with the x86_64 backend contributing another 190K. Before reading a single function, you need a mental map — which files matter, how they connect, and where data flows. This article provides that map.

We'll walk through the repository layout, trace the full IR chain from source to binary, meet the three central data structures, and understand the bootstrap process that makes self-hosting possible.

Repository Layout and Directory Structure

The Zig monorepo packs the compiler, standard library, build tool, and bundled linkers into a single tree. Here's the high-level layout:

Directory Purpose
src/ The compiler itself — Sema, codegen, linkers, CLI
lib/std/ Standard library, including the compiler frontend
lib/std/zig/ Tokenizer, parser, AstGen, ZIR — shared with tools
stage1/ Bootstrap artifacts: zig1.wasm, wasm2c.c, wasi.c
build.zig Build tool definition for the compiler
bootstrap.c Pure C program that chains zig1 → zig2 → zig3
test/ Compiler test suite
lib/compiler/ Bundled compiler-rt, aro (C parser), etc.

The most surprising thing here is that the compiler frontend — tokenizer, parser, and AstGen — lives in lib/std/zig/, not in src/. This is a deliberate architectural decision we'll explore in Article 2.

graph TD
    subgraph "Repository Root"
        A["src/"] --> B["Compiler Core"]
        C["lib/std/zig/"] --> D["Frontend (shared)"]
        E["stage1/"] --> F["Bootstrap Artifacts"]
        G["build.zig"] --> H["Build System"]
        I["bootstrap.c"] --> J["C Bootstrap Chain"]
    end
    D -->|"used by"| B
    D -->|"used by"| K["zig fmt, ZLS"]

Tip: When navigating the codebase, remember that @import("std") in compiler code pulls from lib/std/. So std.zig.Zir resolves to lib/std/zig/Zir.zig, not anything in src/.

The Compilation Pipeline at a Glance

Every .zig source file passes through a chain of intermediate representations before becoming machine code. The pipeline has six stages:

flowchart LR
    A["Source\nBytes"] --> B["Tokens"]
    B --> C["AST"]
    C --> D["ZIR"]
    D --> E["AIR"]
    E --> F["MIR"]
    F --> G["Machine Code\n/ Binary"]

    style A fill:#e8f5e9
    style D fill:#fff3e0
    style E fill:#e3f2fd
    style G fill:#fce4ec
Stage File Location
Tokenization tokenizer.zig lib/std/zig/
Parsing Parse.zig lib/std/zig/
AstGen (AST → ZIR) AstGen.zig lib/std/zig/
Sema (ZIR → AIR) Sema.zig src/
Codegen (AIR → MIR) codegen.zig src/
Linking (MIR → Binary) link.zig src/

The boundary between lib/std/zig/ and src/ marks the boundary between untyped and typed representations. ZIR is the last untyped IR; Sema transforms it into AIR (Analyzed IR), which carries full type information. This is also the boundary between code that tools like zig fmt and ZLS can reuse, and compiler-only code.

Entry Point: main.zig and Command Dispatch

The compiler starts at src/main.zig#L166, where pub fn main() sets up the global allocator — choosing between a debug allocator, libc allocator, or the SMP allocator depending on the build mode.

After allocator setup, control flows to mainArgs(), which is a large if-else chain performing command dispatch by string comparison:

flowchart TD
    M["main()"] --> MA["mainArgs()"]
    MA -->|"build-exe"| BOT["buildOutputType()"]
    MA -->|"build-lib"| BOT
    MA -->|"build-obj"| BOT
    MA -->|"test"| BOT
    MA -->|"run"| BOT
    MA -->|"cc / c++"| BOT
    MA -->|"fmt"| FMT["fmt.zig"]
    MA -->|"build"| CMD["cmdBuild()"]
    MA -->|"fetch"| FETCH["cmdFetch()"]

The core compilation path runs through buildOutputType(), a massive function that parses CLI flags, constructs a Compilation object, and calls update() on it. This single function handles build-exe, build-lib, build-obj, test, run, and even cc/c++ invocations.

Notice the dev.check() calls before each command — these are compile-time feature gates that we'll explore in the bootstrap section. For example, dev.check(.build_exe_command) at line 254 ensures the build-exe command is available in the current build environment.

The Three Core Structs: Compilation, Zcu, InternPool

The entire compilation process is orchestrated by three interconnected data structures. Understanding their relationships is essential to reading any part of the codebase.

flowchart TD
    COMP["Compilation\n(top-level orchestrator)"]
    ZCU["Zcu\n(Zig Compilation Unit)"]
    IP["InternPool\n(types + values store)"]

    COMP -->|"zcu: ?*Zcu"| ZCU
    ZCU -->|"comp: *Compilation"| COMP
    ZCU -->|"intern_pool"| IP
    COMP -->|"config: Config"| CFG["Config"]
    COMP -->|"bin_file: ?*link.File"| LNK["Linker Output"]
    COMP -->|"work_queues"| WQ["Job Queues"]

Compilation (src/Compilation.zig) is the top-level orchestrator. It owns the configuration, job queues, link output, C object compilation, and threading infrastructure. Not every Compilation involves Zig code — you can do zig build-exe foo.o — so it holds a zcu: ?*Zcu that may be null.

Zcu (src/Zcu.zig) is the Zig Compilation Unit. It exists when there's Zig source code to compile. It owns the module graph, file tracking, exports, and the InternPool. It points back to its parent Compilation via comp: *Compilation.

InternPool (src/InternPool.zig) is the universal store for all types and values. Both are represented as u32 indices into this structure. It's sharded for concurrent access, with per-thread Local storage and shared Shard arrays. The InternPool also houses the dependency tracking infrastructure that powers incremental compilation.

Tip: When you see a function taking pt: Zcu.PerThread, that's a thread-safe wrapper around *Zcu that carries a thread ID for InternPool shard selection. It's the most common parameter type in the compiler.

Bootstrap and Feature Gating with dev.zig

How do you compile a self-hosted compiler for the first time? Zig solves this with a three-stage bootstrap and a clever feature-gating system.

The process starts with bootstrap.c, a pure C program that orchestrates the chain:

sequenceDiagram
    participant CC as System C Compiler
    participant W2C as wasm2c
    participant Z1 as zig1 (bootstrap)
    participant Z2 as zig2 (core)
    participant Z3 as zig3 (full)

    CC->>W2C: Compile stage1/wasm2c.c
    W2C->>Z1: Convert zig1.wasm → zig1.c
    CC->>Z1: Compile zig1.c + wasi.c
    Z1->>Z2: Build zig2.c (-ofmt=c, C backend only)
    CC->>Z2: Compile zig2.c + compiler_rt.c
    Z2->>Z3: Build full compiler (all backends)

Stage 1 (zig1): A pre-compiled zig1.wasm in stage1/ gets converted to C via wasm2c, then compiled with the system C compiler. This zig1 runs in the bootstrap environment — it can only emit C code (-ofmt=c).

Stage 2 (zig2): zig1 compiles the compiler source to zig2.c. The bootstrap.c script writes a config.zig that sets pub const dev = .core;, enabling the core environment. The system C compiler then compiles zig2.c into a native binary.

Stage 3 (zig3): zig2 builds the full compiler with all backends and features enabled.

The magic that makes this work is src/dev.zig. It defines an Env enum with variants bootstrap, core, and full (plus several development-focused variants). Each variant declares which Features it supports via the supports() function:

The check() function at line 297 is where the magic happens: it's inline, and when a feature isn't supported, it returns noreturn. This means the compiler at comptime dead-code-eliminates entire subsystems. The bootstrap environment supports only 6 features (build-exe, build-obj, ast_gen, sema, c_backend, c_linker), so the resulting zig1 binary is dramatically smaller than the full compiler.

The current environment is determined at line 320:

pub const env: Env = if (@hasDecl(build_options, "dev"))
    @field(Env, @tagName(build_options.dev))
else if (@hasDecl(build_options, "only_c") and build_options.only_c)
    .bootstrap
else ...
    .full;

The build system at build.zig declares version 0.16.0 and imports DevEnv from src/dev.zig, threading the environment through to the compiler build.

What's Next

With this map in hand, we're ready to dive into the first half of the pipeline. In Article 2, we'll explore the compiler frontend — the tokenizer, parser, and AstGen phases that live in lib/std/zig/. We'll trace how source bytes become ZIR, examining the flat instruction-array data structure that makes ZIR so efficient for sequential processing by Sema.