Read OSS

Architecture Overview and How to Navigate 75 Compiler Crates

Intermediate

Prerequisites

  • Basic Rust (ownership, lifetimes, traits, generics)
  • General familiarity with compiler concepts (lexing, parsing, type checking, code generation)

Architecture Overview and How to Navigate 75 Compiler Crates

The Rust compiler is one of the most ambitious open-source compiler projects in existence. At roughly 75 crates, 2 million lines of code, and a demand-driven query architecture that replaced the traditional sequential pipeline, rustc can be intimidating to approach. This article gives you the mental map you need to navigate it confidently — from the repository's top-level directories, through the rustc binary entry point, down into the driver that orchestrates parsing, analysis, and codegen.

By the end, you'll understand where any given piece of compiler logic lives, how to trace a compilation from main() to a linked binary, and how the bootstrap build system compiles the compiler itself.

Repository Layout: compiler/, library/, src/, tests/

The rust-lang/rust repository is a Cargo workspace with hundreds of crates organized under a handful of top-level directories. Understanding these four is the key to not getting lost:

Directory Purpose
compiler/ The ~75 crates that make up rustc itself
library/ The standard library (core, alloc, std, proc_macro, test)
src/ Non-compiler tools: bootstrap (the build system), rustdoc, rustfmt, clippy, etc.
tests/ The massive test suite (ui/, run-pass/, codegen/, etc.)

The workspace root Cargo.toml ties everything together as a single Cargo workspace. This lets cargo resolve all inter-crate dependencies in one pass and ensures you can cargo check any compiler crate from the repo root.

graph TD
    ROOT["rust-lang/rust"]
    ROOT --> COMPILER["compiler/<br/>~75 rustc crates"]
    ROOT --> LIBRARY["library/<br/>core, alloc, std"]
    ROOT --> SRC["src/<br/>bootstrap, tools"]
    ROOT --> TESTS["tests/<br/>ui, codegen, run-pass"]
    
    COMPILER --> RUSTC_BIN["rustc (binary)"]
    COMPILER --> RUSTC_DRIVER["rustc_driver_impl"]
    COMPILER --> RUSTC_INTERFACE["rustc_interface"]
    COMPILER --> RUSTC_MIDDLE["rustc_middle"]
    COMPILER --> MORE["...~70 more"]

Tip: When you're lost looking for where something lives, remember this heuristic: if it's about compiling Rust, look in compiler/. If it's about the runtime behavior of Rust programs, look in library/. If it's about building the compiler, look in src/bootstrap/.

The rustc Binary Entry Point

Every rustc invocation starts in a surprisingly tiny file. The entire binary entry point is under 50 lines:

compiler/rustc/src/main.rs#L43-L45

fn main() -> ExitCode {
    rustc_driver::main()
}

That's it. The real work happens elsewhere. But before that one-liner, there's a critical piece of infrastructure: jemalloc integration. Lines 8–41 contain a detailed comment explaining why rustc doesn't use #[global_allocator] and instead relies on jemalloc-sys's feature to override malloc/free at the libc level. This ensures a consistent allocator across the rustc ↔ LLVM boundary.

The rustc_driver crate itself is equally minimal — a re-export shim:

compiler/rustc_driver/src/lib.rs#L1-L4

// This crate is intentionally empty and a re-export of `rustc_driver_impl`
pub use rustc_driver_impl::*;

This split exists for a practical reason: it allows rustc_driver_impl to compile in parallel with other crates that depend on the rustc_driver interface, improving build times for the compiler itself.

sequenceDiagram
    participant Binary as rustc (binary)
    participant Driver as rustc_driver
    participant DriverImpl as rustc_driver_impl
    participant Interface as rustc_interface
    
    Binary->>Driver: rustc_driver::main()
    Driver->>DriverImpl: (re-export)
    DriverImpl->>DriverImpl: run_compiler()
    DriverImpl->>Interface: interface::run_compiler(config, closure)
    Interface->>Interface: Create Session, Compiler
    Interface-->>DriverImpl: Calls closure with &Compiler

The Driver: Callbacks, run_compiler(), and the Pipeline

The heart of the compiler driver lives in rustc_driver_impl. Two constructs define its public API: the Callbacks trait and the run_compiler() function.

The Callbacks Trait

compiler/rustc_driver_impl/src/lib.rs#L118-L149

The Callbacks trait offers four hooks into the compilation pipeline:

  1. config() — Called before the Compiler is created. Lets you mutate the Config struct.
  2. after_crate_root_parsing() — Called after parsing the crate root (before submodules). Can halt compilation.
  3. after_expansion() — Called after macro expansion and name resolution. Receives TyCtxt.
  4. after_analysis() — Called after type checking, borrow checking, and all analysis passes.

Each callback (except config) returns a Compilation enum — either Continue or Stop. This is how tools like Clippy and rust-analyzer hook into the compiler: they implement Callbacks, inject custom lints via config(), and inspect the analyzed program in after_analysis().

The run_compiler() Orchestrator

compiler/rustc_driver_impl/src/lib.rs#L170-L354

run_compiler() is the primary entry point for rustc. It orchestrates the entire compilation pipeline:

flowchart TD
    A["run_compiler(args, callbacks)"] --> B[Parse CLI arguments]
    B --> C[Build Config struct]
    C --> D["callbacks.config(&mut config)"]
    D --> E["interface::run_compiler(config, closure)"]
    E --> F["passes::parse(sess) → AST"]
    F --> G["callbacks.after_crate_root_parsing()"]
    G --> H["create_and_enter_global_ctxt()"]
    H --> I["resolver_for_lowering() → expansion + name resolution"]
    I --> J["callbacks.after_expansion()"]
    J --> K["tcx.ensure_ok().analysis(())"]
    K --> L["callbacks.after_analysis()"]
    L --> M["Linker::codegen_and_build_linker()"]
    M --> N["linker.link(sess, codegen_backend)"]

A particularly interesting design choice is that linking happens outside the global context closure (line 350–352). This allows the GlobalCtxt (which holds all the arenas and interned data) to be freed before linking begins, reducing peak memory usage.

The Compiler and Config Structs

When interface::run_compiler() is called, it creates two key structs: Config (the input) and Compiler (the session).

The Config Struct

compiler/rustc_interface/src/interface.rs#L315-L375

Config is the master configuration object. Its most important fields:

Field Purpose
opts Parsed command-line options (-O, --edition, -C, -Z flags)
input The source input — either a file path or a string
register_lints Callback for custom lint registration (used by Clippy)
override_queries Callback to replace query implementations (used by rust-analyzer)
make_codegen_backend Factory for custom codegen backends

The Compiler Struct

compiler/rustc_interface/src/interface.rs#L38-L48

pub struct Compiler {
    pub sess: Session,
    pub codegen_backend: Box<dyn CodegenBackend>,
    pub(crate) override_queries: Option<fn(&Session, &mut Providers)>,
    pub(crate) current_gcx: CurrentGcx,
    pub(crate) jobserver_proxy: Arc<Proxy>,
}

Compiler is the runtime representation of a compilation session. It holds the Session (options, diagnostics, file I/O), the codegen backend, and the override_queries hook. The current_gcx field is a handle to the GlobalCtxt that gets created later — it's the bridge between the Compiler and the query system.

Tip: If you're building a custom driver (like clippy-driver or miri), your entry point is implementing Callbacks and calling run_compiler(). Use config() to register custom lints and override_queries to intercept specific queries.

The Crate Map: A Guided Tour of 75 Compiler Crates

The compiler crates under compiler/ can be organized by which stage of compilation they serve. Here is a categorized overview of the most important ones:

flowchart LR
    subgraph Parsing
        rustc_parse
        rustc_ast
        rustc_lexer
    end
    subgraph Expansion
        rustc_expand
        rustc_resolve
        rustc_builtin_macros
    end
    subgraph Lowering
        rustc_ast_lowering
        rustc_hir
    end
    subgraph Analysis
        rustc_hir_analysis
        rustc_hir_typeck
        rustc_borrowck
        rustc_mir_build
    end
    subgraph MIR
        rustc_mir_transform
        rustc_const_eval
    end
    subgraph Codegen
        rustc_codegen_ssa
        rustc_codegen_llvm
        rustc_monomorphize
    end
    subgraph Infrastructure
        rustc_middle
        rustc_data_structures
        rustc_errors
        rustc_session
        rustc_span
    end
    
    Parsing --> Expansion --> Lowering --> Analysis --> MIR --> Codegen
Stage Key Crates Role
Parsing rustc_lexer, rustc_parse, rustc_ast Source text → token-oriented AST
Expansion rustc_expand, rustc_resolve, rustc_builtin_macros Macro expansion, name resolution
Lowering rustc_ast_lowering, rustc_hir AST → HIR (desugar for, ?, async)
Analysis rustc_hir_analysis, rustc_hir_typeck, rustc_borrowck Type checking, borrow checking
MIR rustc_mir_build, rustc_mir_transform, rustc_const_eval MIR construction, optimization, CTFE
Codegen rustc_codegen_ssa, rustc_codegen_llvm, rustc_monomorphize Monomorphization, LLVM IR generation
Infrastructure rustc_middle, rustc_data_structures, rustc_errors, rustc_span Shared types, data structures, diagnostics

The most important crate to understand is rustc_middle. It defines the shared data types used across all stages: TyCtxt, Ty, Body (MIR), DefId, and the query system definitions. Nearly every other compiler crate depends on it.

Provider registration happens in DEFAULT_QUERY_PROVIDERS, where ~20 crates each register their query implementations:

compiler/rustc_interface/src/passes.rs#L880-L919

The Bootstrap Build System

You can't compile rustc with a normal cargo build — the compiler needs itself to compile. This chicken-and-egg problem is solved by the bootstrap build system.

The entry point is x.py, a thin Python wrapper. The real build logic lives in src/bootstrap/, a Rust program that builds in multiple stages:

flowchart TD
    S0["Stage 0<br/>Download beta compiler"] --> S1["Stage 1<br/>Compile rustc with beta"]
    S1 --> S2["Stage 2<br/>Compile rustc with stage-1 rustc"]
    S2 --> DIST["Distribution artifacts"]
    
    S0 -.->|"uses"| BETA["Downloaded beta rustc"]
    S1 -.->|"produces"| STAGE1["Stage 1 rustc"]
    S2 -.->|"produces"| STAGE2["Stage 2 rustc (final)"]

The build steps are organized as modules in src/bootstrap/src/core/build_steps/mod.rs:

Module Responsibility
compile Building the compiler and standard library
llvm Building LLVM from source (or downloading CI artifacts)
test Running the test suite
dist Creating release tarballs
tool Building tools (rustdoc, clippy, rustfmt, etc.)
doc Generating documentation

For day-to-day development, the most common commands are:

  • ./x.py build — Build the compiler (stage 1 by default)
  • ./x.py check — Type-check only (much faster)
  • ./x.py test tests/ui — Run UI tests

Configuration lives in bootstrap.toml (copy from bootstrap.example.toml). Key settings include [llvm].download-ci-llvm = true (skip building LLVM) and [rust].debug = true (debug assertions).

Tip: For your first build, set download-ci-llvm = true in your bootstrap.toml. Building LLVM from source can take 30+ minutes; downloading CI artifacts takes seconds.

What's Next

We've established the foundation: repository layout, the entry point chain from main() through the driver, the Callbacks trait for custom tooling, and the ~75-crate landscape. But we've only scratched the surface of how compilation actually happens.

In Part 2, we'll dive into the architectural innovation that makes rustc fundamentally different from traditional compilers: the query system. Instead of running passes in a fixed sequence, rustc uses demand-driven, memoized computations that enable incremental compilation — and understanding this system is the key to understanding everything else in the compiler.