Architecture Overview and How to Navigate 75 Compiler Crates
Prerequisites
- ›Basic Rust (ownership, lifetimes, traits, generics)
- ›General familiarity with compiler concepts (lexing, parsing, type checking, code generation)
Architecture Overview and How to Navigate 75 Compiler Crates
The Rust compiler is one of the most ambitious open-source compiler projects in existence. At roughly 75 crates, 2 million lines of code, and a demand-driven query architecture that replaced the traditional sequential pipeline, rustc can be intimidating to approach. This article gives you the mental map you need to navigate it confidently — from the repository's top-level directories, through the rustc binary entry point, down into the driver that orchestrates parsing, analysis, and codegen.
By the end, you'll understand where any given piece of compiler logic lives, how to trace a compilation from main() to a linked binary, and how the bootstrap build system compiles the compiler itself.
Repository Layout: compiler/, library/, src/, tests/
The rust-lang/rust repository is a Cargo workspace with hundreds of crates organized under a handful of top-level directories. Understanding these four is the key to not getting lost:
| Directory | Purpose |
|---|---|
compiler/ |
The ~75 crates that make up rustc itself |
library/ |
The standard library (core, alloc, std, proc_macro, test) |
src/ |
Non-compiler tools: bootstrap (the build system), rustdoc, rustfmt, clippy, etc. |
tests/ |
The massive test suite (ui/, run-pass/, codegen/, etc.) |
The workspace root Cargo.toml ties everything together as a single Cargo workspace. This lets cargo resolve all inter-crate dependencies in one pass and ensures you can cargo check any compiler crate from the repo root.
graph TD
ROOT["rust-lang/rust"]
ROOT --> COMPILER["compiler/<br/>~75 rustc crates"]
ROOT --> LIBRARY["library/<br/>core, alloc, std"]
ROOT --> SRC["src/<br/>bootstrap, tools"]
ROOT --> TESTS["tests/<br/>ui, codegen, run-pass"]
COMPILER --> RUSTC_BIN["rustc (binary)"]
COMPILER --> RUSTC_DRIVER["rustc_driver_impl"]
COMPILER --> RUSTC_INTERFACE["rustc_interface"]
COMPILER --> RUSTC_MIDDLE["rustc_middle"]
COMPILER --> MORE["...~70 more"]
Tip: When you're lost looking for where something lives, remember this heuristic: if it's about compiling Rust, look in
compiler/. If it's about the runtime behavior of Rust programs, look inlibrary/. If it's about building the compiler, look insrc/bootstrap/.
The rustc Binary Entry Point
Every rustc invocation starts in a surprisingly tiny file. The entire binary entry point is under 50 lines:
compiler/rustc/src/main.rs#L43-L45
fn main() -> ExitCode {
rustc_driver::main()
}
That's it. The real work happens elsewhere. But before that one-liner, there's a critical piece of infrastructure: jemalloc integration. Lines 8–41 contain a detailed comment explaining why rustc doesn't use #[global_allocator] and instead relies on jemalloc-sys's feature to override malloc/free at the libc level. This ensures a consistent allocator across the rustc ↔ LLVM boundary.
The rustc_driver crate itself is equally minimal — a re-export shim:
compiler/rustc_driver/src/lib.rs#L1-L4
// This crate is intentionally empty and a re-export of `rustc_driver_impl`
pub use rustc_driver_impl::*;
This split exists for a practical reason: it allows rustc_driver_impl to compile in parallel with other crates that depend on the rustc_driver interface, improving build times for the compiler itself.
sequenceDiagram
participant Binary as rustc (binary)
participant Driver as rustc_driver
participant DriverImpl as rustc_driver_impl
participant Interface as rustc_interface
Binary->>Driver: rustc_driver::main()
Driver->>DriverImpl: (re-export)
DriverImpl->>DriverImpl: run_compiler()
DriverImpl->>Interface: interface::run_compiler(config, closure)
Interface->>Interface: Create Session, Compiler
Interface-->>DriverImpl: Calls closure with &Compiler
The Driver: Callbacks, run_compiler(), and the Pipeline
The heart of the compiler driver lives in rustc_driver_impl. Two constructs define its public API: the Callbacks trait and the run_compiler() function.
The Callbacks Trait
compiler/rustc_driver_impl/src/lib.rs#L118-L149
The Callbacks trait offers four hooks into the compilation pipeline:
config()— Called before theCompileris created. Lets you mutate theConfigstruct.after_crate_root_parsing()— Called after parsing the crate root (before submodules). Can halt compilation.after_expansion()— Called after macro expansion and name resolution. ReceivesTyCtxt.after_analysis()— Called after type checking, borrow checking, and all analysis passes.
Each callback (except config) returns a Compilation enum — either Continue or Stop. This is how tools like Clippy and rust-analyzer hook into the compiler: they implement Callbacks, inject custom lints via config(), and inspect the analyzed program in after_analysis().
The run_compiler() Orchestrator
compiler/rustc_driver_impl/src/lib.rs#L170-L354
run_compiler() is the primary entry point for rustc. It orchestrates the entire compilation pipeline:
flowchart TD
A["run_compiler(args, callbacks)"] --> B[Parse CLI arguments]
B --> C[Build Config struct]
C --> D["callbacks.config(&mut config)"]
D --> E["interface::run_compiler(config, closure)"]
E --> F["passes::parse(sess) → AST"]
F --> G["callbacks.after_crate_root_parsing()"]
G --> H["create_and_enter_global_ctxt()"]
H --> I["resolver_for_lowering() → expansion + name resolution"]
I --> J["callbacks.after_expansion()"]
J --> K["tcx.ensure_ok().analysis(())"]
K --> L["callbacks.after_analysis()"]
L --> M["Linker::codegen_and_build_linker()"]
M --> N["linker.link(sess, codegen_backend)"]
A particularly interesting design choice is that linking happens outside the global context closure (line 350–352). This allows the GlobalCtxt (which holds all the arenas and interned data) to be freed before linking begins, reducing peak memory usage.
The Compiler and Config Structs
When interface::run_compiler() is called, it creates two key structs: Config (the input) and Compiler (the session).
The Config Struct
compiler/rustc_interface/src/interface.rs#L315-L375
Config is the master configuration object. Its most important fields:
| Field | Purpose |
|---|---|
opts |
Parsed command-line options (-O, --edition, -C, -Z flags) |
input |
The source input — either a file path or a string |
register_lints |
Callback for custom lint registration (used by Clippy) |
override_queries |
Callback to replace query implementations (used by rust-analyzer) |
make_codegen_backend |
Factory for custom codegen backends |
The Compiler Struct
compiler/rustc_interface/src/interface.rs#L38-L48
pub struct Compiler {
pub sess: Session,
pub codegen_backend: Box<dyn CodegenBackend>,
pub(crate) override_queries: Option<fn(&Session, &mut Providers)>,
pub(crate) current_gcx: CurrentGcx,
pub(crate) jobserver_proxy: Arc<Proxy>,
}
Compiler is the runtime representation of a compilation session. It holds the Session (options, diagnostics, file I/O), the codegen backend, and the override_queries hook. The current_gcx field is a handle to the GlobalCtxt that gets created later — it's the bridge between the Compiler and the query system.
Tip: If you're building a custom driver (like
clippy-driverormiri), your entry point is implementingCallbacksand callingrun_compiler(). Useconfig()to register custom lints andoverride_queriesto intercept specific queries.
The Crate Map: A Guided Tour of 75 Compiler Crates
The compiler crates under compiler/ can be organized by which stage of compilation they serve. Here is a categorized overview of the most important ones:
flowchart LR
subgraph Parsing
rustc_parse
rustc_ast
rustc_lexer
end
subgraph Expansion
rustc_expand
rustc_resolve
rustc_builtin_macros
end
subgraph Lowering
rustc_ast_lowering
rustc_hir
end
subgraph Analysis
rustc_hir_analysis
rustc_hir_typeck
rustc_borrowck
rustc_mir_build
end
subgraph MIR
rustc_mir_transform
rustc_const_eval
end
subgraph Codegen
rustc_codegen_ssa
rustc_codegen_llvm
rustc_monomorphize
end
subgraph Infrastructure
rustc_middle
rustc_data_structures
rustc_errors
rustc_session
rustc_span
end
Parsing --> Expansion --> Lowering --> Analysis --> MIR --> Codegen
| Stage | Key Crates | Role |
|---|---|---|
| Parsing | rustc_lexer, rustc_parse, rustc_ast |
Source text → token-oriented AST |
| Expansion | rustc_expand, rustc_resolve, rustc_builtin_macros |
Macro expansion, name resolution |
| Lowering | rustc_ast_lowering, rustc_hir |
AST → HIR (desugar for, ?, async) |
| Analysis | rustc_hir_analysis, rustc_hir_typeck, rustc_borrowck |
Type checking, borrow checking |
| MIR | rustc_mir_build, rustc_mir_transform, rustc_const_eval |
MIR construction, optimization, CTFE |
| Codegen | rustc_codegen_ssa, rustc_codegen_llvm, rustc_monomorphize |
Monomorphization, LLVM IR generation |
| Infrastructure | rustc_middle, rustc_data_structures, rustc_errors, rustc_span |
Shared types, data structures, diagnostics |
The most important crate to understand is rustc_middle. It defines the shared data types used across all stages: TyCtxt, Ty, Body (MIR), DefId, and the query system definitions. Nearly every other compiler crate depends on it.
Provider registration happens in DEFAULT_QUERY_PROVIDERS, where ~20 crates each register their query implementations:
compiler/rustc_interface/src/passes.rs#L880-L919
The Bootstrap Build System
You can't compile rustc with a normal cargo build — the compiler needs itself to compile. This chicken-and-egg problem is solved by the bootstrap build system.
The entry point is x.py, a thin Python wrapper. The real build logic lives in src/bootstrap/, a Rust program that builds in multiple stages:
flowchart TD
S0["Stage 0<br/>Download beta compiler"] --> S1["Stage 1<br/>Compile rustc with beta"]
S1 --> S2["Stage 2<br/>Compile rustc with stage-1 rustc"]
S2 --> DIST["Distribution artifacts"]
S0 -.->|"uses"| BETA["Downloaded beta rustc"]
S1 -.->|"produces"| STAGE1["Stage 1 rustc"]
S2 -.->|"produces"| STAGE2["Stage 2 rustc (final)"]
The build steps are organized as modules in src/bootstrap/src/core/build_steps/mod.rs:
| Module | Responsibility |
|---|---|
compile |
Building the compiler and standard library |
llvm |
Building LLVM from source (or downloading CI artifacts) |
test |
Running the test suite |
dist |
Creating release tarballs |
tool |
Building tools (rustdoc, clippy, rustfmt, etc.) |
doc |
Generating documentation |
For day-to-day development, the most common commands are:
./x.py build— Build the compiler (stage 1 by default)./x.py check— Type-check only (much faster)./x.py test tests/ui— Run UI tests
Configuration lives in bootstrap.toml (copy from bootstrap.example.toml). Key settings include [llvm].download-ci-llvm = true (skip building LLVM) and [rust].debug = true (debug assertions).
Tip: For your first build, set
download-ci-llvm = truein yourbootstrap.toml. Building LLVM from source can take 30+ minutes; downloading CI artifacts takes seconds.
What's Next
We've established the foundation: repository layout, the entry point chain from main() through the driver, the Callbacks trait for custom tooling, and the ~75-crate landscape. But we've only scratched the surface of how compilation actually happens.
In Part 2, we'll dive into the architectural innovation that makes rustc fundamentally different from traditional compilers: the query system. Instead of running passes in a fixed sequence, rustc uses demand-driven, memoized computations that enable incremental compilation — and understanding this system is the key to understanding everything else in the compiler.