Architecture of the Zig Compiler: A Map for the Codebase
Prerequisites
- ›Basic Zig language knowledge (comptime, @import, error unions, packed structs)
- ›General familiarity with compiler concepts (lexing, parsing, IRs, code generation)
Architecture of the Zig Compiler: A Map for the Codebase
The Zig compiler is a self-hosted, multi-stage compiler that lives in a single monorepo. At the time of writing, the src/ directory alone contains over 300K lines of Zig, with the x86_64 backend contributing another 190K. Before reading a single function, you need a mental map — which files matter, how they connect, and where data flows. This article provides that map.
We'll walk through the repository layout, trace the full IR chain from source to binary, meet the three central data structures, and understand the bootstrap process that makes self-hosting possible.
Repository Layout and Directory Structure
The Zig monorepo packs the compiler, standard library, build tool, and bundled linkers into a single tree. Here's the high-level layout:
| Directory | Purpose |
|---|---|
src/ |
The compiler itself — Sema, codegen, linkers, CLI |
lib/std/ |
Standard library, including the compiler frontend |
lib/std/zig/ |
Tokenizer, parser, AstGen, ZIR — shared with tools |
stage1/ |
Bootstrap artifacts: zig1.wasm, wasm2c.c, wasi.c |
build.zig |
Build tool definition for the compiler |
bootstrap.c |
Pure C program that chains zig1 → zig2 → zig3 |
test/ |
Compiler test suite |
lib/compiler/ |
Bundled compiler-rt, aro (C parser), etc. |
The most surprising thing here is that the compiler frontend — tokenizer, parser, and AstGen — lives in lib/std/zig/, not in src/. This is a deliberate architectural decision we'll explore in Article 2.
graph TD
subgraph "Repository Root"
A["src/"] --> B["Compiler Core"]
C["lib/std/zig/"] --> D["Frontend (shared)"]
E["stage1/"] --> F["Bootstrap Artifacts"]
G["build.zig"] --> H["Build System"]
I["bootstrap.c"] --> J["C Bootstrap Chain"]
end
D -->|"used by"| B
D -->|"used by"| K["zig fmt, ZLS"]
Tip: When navigating the codebase, remember that
@import("std")in compiler code pulls fromlib/std/. Sostd.zig.Zirresolves tolib/std/zig/Zir.zig, not anything insrc/.
The Compilation Pipeline at a Glance
Every .zig source file passes through a chain of intermediate representations before becoming machine code. The pipeline has six stages:
flowchart LR
A["Source\nBytes"] --> B["Tokens"]
B --> C["AST"]
C --> D["ZIR"]
D --> E["AIR"]
E --> F["MIR"]
F --> G["Machine Code\n/ Binary"]
style A fill:#e8f5e9
style D fill:#fff3e0
style E fill:#e3f2fd
style G fill:#fce4ec
| Stage | File | Location |
|---|---|---|
| Tokenization | tokenizer.zig |
lib/std/zig/ |
| Parsing | Parse.zig |
lib/std/zig/ |
| AstGen (AST → ZIR) | AstGen.zig |
lib/std/zig/ |
| Sema (ZIR → AIR) | Sema.zig |
src/ |
| Codegen (AIR → MIR) | codegen.zig |
src/ |
| Linking (MIR → Binary) | link.zig |
src/ |
The boundary between lib/std/zig/ and src/ marks the boundary between untyped and typed representations. ZIR is the last untyped IR; Sema transforms it into AIR (Analyzed IR), which carries full type information. This is also the boundary between code that tools like zig fmt and ZLS can reuse, and compiler-only code.
Entry Point: main.zig and Command Dispatch
The compiler starts at src/main.zig#L166, where pub fn main() sets up the global allocator — choosing between a debug allocator, libc allocator, or the SMP allocator depending on the build mode.
After allocator setup, control flows to mainArgs(), which is a large if-else chain performing command dispatch by string comparison:
flowchart TD
M["main()"] --> MA["mainArgs()"]
MA -->|"build-exe"| BOT["buildOutputType()"]
MA -->|"build-lib"| BOT
MA -->|"build-obj"| BOT
MA -->|"test"| BOT
MA -->|"run"| BOT
MA -->|"cc / c++"| BOT
MA -->|"fmt"| FMT["fmt.zig"]
MA -->|"build"| CMD["cmdBuild()"]
MA -->|"fetch"| FETCH["cmdFetch()"]
The core compilation path runs through buildOutputType(), a massive function that parses CLI flags, constructs a Compilation object, and calls update() on it. This single function handles build-exe, build-lib, build-obj, test, run, and even cc/c++ invocations.
Notice the dev.check() calls before each command — these are compile-time feature gates that we'll explore in the bootstrap section. For example, dev.check(.build_exe_command) at line 254 ensures the build-exe command is available in the current build environment.
The Three Core Structs: Compilation, Zcu, InternPool
The entire compilation process is orchestrated by three interconnected data structures. Understanding their relationships is essential to reading any part of the codebase.
flowchart TD
COMP["Compilation\n(top-level orchestrator)"]
ZCU["Zcu\n(Zig Compilation Unit)"]
IP["InternPool\n(types + values store)"]
COMP -->|"zcu: ?*Zcu"| ZCU
ZCU -->|"comp: *Compilation"| COMP
ZCU -->|"intern_pool"| IP
COMP -->|"config: Config"| CFG["Config"]
COMP -->|"bin_file: ?*link.File"| LNK["Linker Output"]
COMP -->|"work_queues"| WQ["Job Queues"]
Compilation (src/Compilation.zig) is the top-level orchestrator. It owns the configuration, job queues, link output, C object compilation, and threading infrastructure. Not every Compilation involves Zig code — you can do zig build-exe foo.o — so it holds a zcu: ?*Zcu that may be null.
Zcu (src/Zcu.zig) is the Zig Compilation Unit. It exists when there's Zig source code to compile. It owns the module graph, file tracking, exports, and the InternPool. It points back to its parent Compilation via comp: *Compilation.
InternPool (src/InternPool.zig) is the universal store for all types and values. Both are represented as u32 indices into this structure. It's sharded for concurrent access, with per-thread Local storage and shared Shard arrays. The InternPool also houses the dependency tracking infrastructure that powers incremental compilation.
Tip: When you see a function taking
pt: Zcu.PerThread, that's a thread-safe wrapper around*Zcuthat carries a thread ID for InternPool shard selection. It's the most common parameter type in the compiler.
Bootstrap and Feature Gating with dev.zig
How do you compile a self-hosted compiler for the first time? Zig solves this with a three-stage bootstrap and a clever feature-gating system.
The process starts with bootstrap.c, a pure C program that orchestrates the chain:
sequenceDiagram
participant CC as System C Compiler
participant W2C as wasm2c
participant Z1 as zig1 (bootstrap)
participant Z2 as zig2 (core)
participant Z3 as zig3 (full)
CC->>W2C: Compile stage1/wasm2c.c
W2C->>Z1: Convert zig1.wasm → zig1.c
CC->>Z1: Compile zig1.c + wasi.c
Z1->>Z2: Build zig2.c (-ofmt=c, C backend only)
CC->>Z2: Compile zig2.c + compiler_rt.c
Z2->>Z3: Build full compiler (all backends)
Stage 1 (zig1): A pre-compiled zig1.wasm in stage1/ gets converted to C via wasm2c, then compiled with the system C compiler. This zig1 runs in the bootstrap environment — it can only emit C code (-ofmt=c).
Stage 2 (zig2): zig1 compiles the compiler source to zig2.c. The bootstrap.c script writes a config.zig that sets pub const dev = .core;, enabling the core environment. The system C compiler then compiles zig2.c into a native binary.
Stage 3 (zig3): zig2 builds the full compiler with all backends and features enabled.
The magic that makes this work is src/dev.zig. It defines an Env enum with variants bootstrap, core, and full (plus several development-focused variants). Each variant declares which Features it supports via the supports() function:
The check() function at line 297 is where the magic happens: it's inline, and when a feature isn't supported, it returns noreturn. This means the compiler at comptime dead-code-eliminates entire subsystems. The bootstrap environment supports only 6 features (build-exe, build-obj, ast_gen, sema, c_backend, c_linker), so the resulting zig1 binary is dramatically smaller than the full compiler.
The current environment is determined at line 320:
pub const env: Env = if (@hasDecl(build_options, "dev"))
@field(Env, @tagName(build_options.dev))
else if (@hasDecl(build_options, "only_c") and build_options.only_c)
.bootstrap
else ...
.full;
The build system at build.zig declares version 0.16.0 and imports DevEnv from src/dev.zig, threading the environment through to the compiler build.
What's Next
With this map in hand, we're ready to dive into the first half of the pipeline. In Article 2, we'll explore the compiler frontend — the tokenizer, parser, and AstGen phases that live in lib/std/zig/. We'll trace how source bytes become ZIR, examining the flat instruction-array data structure that makes ZIR so efficient for sequential processing by Sema.