Linux Kernel Architecture: A Map for Source Code Explorers
Prerequisites
- ›Basic C programming knowledge
- ›General understanding of what an operating system does
Linux Kernel Architecture: A Map for Source Code Explorers
The Linux kernel is roughly 30 million lines of code, contributed by thousands of developers over three decades. Opening the repository for the first time is overwhelming — not because any single file is impenetrable, but because there is no obvious front door. Unlike a web framework with a main() and a request handler, the kernel is a sprawling, conditionally-compiled system where half the code in the tree may not end up in any given build. This article is your map.
We'll walk through the top-level directory layout, the layered architecture that separates hardware-specific code from portable logic, the critical boundary between kernel-internal and userspace-facing headers, and the Kconfig/Kbuild system that decides what actually gets compiled. By the end, you'll know where to look when exploring any kernel subsystem.
The 30-Million-Line Challenge
The first thing to understand about reading the kernel is that you are never reading the whole kernel. A typical x86-64 server build compiles perhaps 10–15% of the source tree. The rest is code for other architectures, disabled features, and drivers for hardware you don't have. The build tool exists precisely to manage this selection.
The MAINTAINERS file — over 29,000 lines — is your Rosetta Stone. Every subsystem, driver, and file pattern is mapped to a human maintainer, a mailing list, and a status. When you encounter an unfamiliar directory, MAINTAINERS tells you who owns it and where discussions happen.
Tip: Use
scripts/get_maintainer.plon any file to instantly find the right people and mailing lists. It parsesMAINTAINERSfor you:./scripts/get_maintainer.pl fs/ext4/super.c.
Top-Level Directory Taxonomy
The top-level layout reflects a deliberate decomposition of the kernel into subsystems. Here's what each directory contains:
| Directory | Purpose |
|---|---|
init/ |
Boot-time initialization, including start_kernel() |
kernel/ |
Core kernel: scheduling, signals, locking, timers, tracing |
mm/ |
Memory management: page allocator, slab, virtual memory |
fs/ |
Virtual Filesystem Switch (VFS) and all filesystem implementations |
net/ |
Networking stack: TCP/IP, sockets, netfilter |
drivers/ |
Device drivers — by far the largest directory (~60% of the tree) |
arch/ |
Architecture-specific code: x86, arm64, riscv, etc. |
include/ |
Header files: both kernel-internal and UAPI |
block/ |
Block I/O layer between filesystems and disk drivers |
io_uring/ |
Async I/O subsystem (promoted to top-level in 2022) |
security/ |
LSM framework: SELinux, AppArmor, etc. |
crypto/ |
Cryptographic algorithms and API |
rust/ |
Rust language support and the kernel crate |
virt/ |
Virtualization support (KVM lives under arch/ but virt/ has shared pieces) |
sound/ |
ALSA audio subsystem |
scripts/ |
Build scripts, code-checking tools, maintainer helpers |
tools/ |
Userspace tools: perf, bpftool, selftests |
lib/ |
Kernel-internal library routines: sort, decompress, string functions |
ipc/ |
System V IPC: shared memory, semaphores, message queues |
certs/ |
Signing certificates for module verification |
usr/ |
initramfs creation |
graph TD
subgraph "Core Kernel"
init["init/"]
kernel["kernel/"]
mm["mm/"]
lib["lib/"]
ipc["ipc/"]
end
subgraph "Subsystems"
fs["fs/"]
net["net/"]
block["block/"]
io_uring["io_uring/"]
security["security/"]
crypto["crypto/"]
sound["sound/"]
virt["virt/"]
end
subgraph "Hardware Abstraction"
arch["arch/"]
drivers["drivers/"]
end
subgraph "Newer Additions"
rust["rust/"]
end
init --> kernel
kernel --> mm
kernel --> arch
fs --> block
block --> drivers
io_uring --> fs
The build order is specified explicitly in the top-level Kbuild file. This is not alphabetical — it is deliberate:
obj-y += init/
obj-y += usr/
obj-y += arch/$(SRCARCH)/
...
obj-y += kernel/
obj-y += mm/
obj-y += fs/
...
obj-$(CONFIG_BLOCK) += block/
obj-$(CONFIG_IO_URING) += io_uring/
obj-$(CONFIG_RUST) += rust/
...
obj-y += drivers/
Notice the pattern: obj-y means "always compiled in," while obj-$(CONFIG_*) means "only if this config symbol is set." The block layer, io_uring, Rust support, networking, and sample code are all conditional. This is the Kconfig system at work, and we'll explore it in detail shortly.
The Three-Layer Architecture
The kernel's code follows a three-layer architecture, though it's enforced by convention rather than a compiler:
flowchart TB
subgraph Layer1["Layer 1: Architecture-Specific"]
direction LR
x86["arch/x86/"]
arm64["arch/arm64/"]
riscv["arch/riscv/"]
end
subgraph Layer2["Layer 2: Core Kernel"]
direction LR
sched["kernel/sched/"]
mm2["mm/"]
vfs["fs/ (VFS)"]
netcore["net/core/"]
end
subgraph Layer3["Layer 3: Drivers & Filesystems"]
direction LR
dri["drivers/gpu/"]
ext4["fs/ext4/"]
tcp["net/ipv4/"]
nvme["drivers/nvme/"]
end
Layer1 -->|"defined interfaces<br/>(asm/ headers)"| Layer2
Layer2 -->|"operations structs<br/>(vtables)"| Layer3
Layer 1 (Architecture-specific) contains the lowest-level code that differs per CPU architecture: boot assembly, interrupt handling, page table formats, syscall entry points. Each architecture lives under arch/<name>/ and exports a well-defined interface via arch/<name>/include/asm/ headers.
Layer 2 (Core kernel) contains the portable subsystems: the scheduler in kernel/sched/, memory management in mm/, the VFS in fs/, and networking in net/. This code calls into Layer 1 through arch-specific function prototypes, and provides extension points for Layer 3 through C struct-based "vtables" — a pattern we'll explore deeply in articles 3 and 4.
Layer 3 (Drivers and filesystem implementations) is where the bulk of the code lives. A filesystem like ext4 implements struct file_operations, struct inode_operations, and struct super_operations to plug into the VFS. A device driver implements operations structures specific to its bus (PCI, USB, platform). This layer never touches architecture-specific code directly — it always goes through Layer 2 abstractions.
Tip: When reading an unfamiliar driver, start by finding its operations structures. They tell you exactly which Layer 2 interfaces the driver implements.
Kernel vs Userspace API Headers
The include/ directory has a critically important subdivision that many newcomers miss:
include/linux/— Kernel-internal headers. These can change between releases with no notice. Only kernel code includes these.include/uapi/linux/— Userspace-facing API headers. These define the stable ABI between the kernel and userspace applications. Changing these is a serious matter that requires backward compatibility.
This separation was introduced by the "UAPI disintegration" effort around 2012. Before that, kernel-internal and userspace definitions were mixed together in the same headers, with #ifdef __KERNEL__ guards to hide internal bits. The current layout makes the boundary explicit in the directory structure.
For example, include/uapi/linux/io_uring.h defines the submission queue entry (struct io_uring_sqe) that userspace programs write to. It carries a dual license (GPL OR MIT) so userspace libraries like liburing can include it. Meanwhile, include/linux/io_uring_types.h defines the kernel-internal struct io_ring_ctx that userspace never sees.
The rule is simple: if it's under uapi/, it's a promise to userspace programs. Everything else is an implementation detail.
The MAINTAINERS Ownership Model
The kernel is not a monolith managed by a single team. It's a federation of subsystems, each with designated maintainers. The MAINTAINERS file encodes this structure with entries like:
FILESYSTEMS (VFS and infrastructure)
M: Alexander Viro <viro@zeniv.linux.org.uk>
M: Christian Brauner <brauner@kernel.org>
L: linux-fsdevel@vger.kernel.org
S: Maintained
F: fs/*
F: include/linux/fs.h
Each entry maps file patterns (F:) to maintainers (M:), reviewers (R:), mailing lists (L:), and a status (S:). The get_maintainer.pl script uses these patterns to determine who should review any given patch.
This ownership model means that code quality standards, naming conventions, and review rigor vary by subsystem. The networking stack, for instance, has notably strict review practices. Understanding who owns a subsystem helps you understand its code culture.
Kconfig and Kbuild: Conditional Compilation at Scale
The kernel's build system has two halves: Kconfig defines what can be configured, and Kbuild compiles what was selected.
Kconfig: The Configuration Language
The root Kconfig file sources subsystem configs in a defined order:
mainmenu "Linux/$(ARCH) $(KERNELVERSION) Kernel Configuration"
source "scripts/Kconfig.include"
source "init/Kconfig"
source "kernel/Kconfig.freezer"
source "fs/Kconfig.binfmt"
source "mm/Kconfig"
source "net/Kconfig"
source "drivers/Kconfig"
...
source "io_uring/Kconfig"
Each sourced file defines config entries with types (bool, tristate, int, string), default values, dependencies, and help text. A tristate option can be y (built-in), m (module), or n (disabled). The total number of CONFIG_* symbols exceeds 20,000.
flowchart LR
A["make menuconfig"] --> B[".config file<br/>(CONFIG_* = y/m/n)"]
B --> C["include/config/auto.conf<br/>(processed config)"]
C --> D["Kbuild Makefiles<br/>obj-$(CONFIG_*) rules"]
D --> E["vmlinux binary"]
Kbuild: Conditional Object Inclusion
Every directory in the kernel has a Makefile (or Kbuild file) that uses obj-y and obj-$(CONFIG_*) to control what gets compiled. The pattern is beautifully simple:
obj-$(CONFIG_EXT4_FS) += ext4/
obj-$(CONFIG_XFS_FS) += xfs/
obj-$(CONFIG_BTRFS_FS) += btrfs/
When CONFIG_EXT4_FS=y, the expression expands to obj-y += ext4/, and ext4 is compiled into the kernel. When CONFIG_EXT4_FS=m, it expands to obj-m += ext4/, building a loadable module. When CONFIG_EXT4_FS=n (or unset), the entire ext4 directory is skipped.
The top-level Makefile orchestrates the final link, collecting all built-in objects into vmlinux.a and linking them into the vmlinux binary:
targets += vmlinux.a
vmlinux.a: $(KBUILD_VMLINUX_OBJS) scripts/head-object-list.txt FORCE
$(call if_changed,ar_vmlinux.a)
flowchart TD
subgraph "Build Phases"
K["Kconfig<br/>Configuration"] --> P["Preprocessing<br/>bounds.h, offsets.h"]
P --> C["Compilation<br/>Per-directory Makefiles"]
C --> A["Archive<br/>vmlinux.a"]
A --> L["Link<br/>vmlinux"]
L --> Z["Compress<br/>bzImage/zImage"]
end
This system means that the kernel is not just one program — it's a framework for generating thousands of different programs, each tailored to specific hardware and feature requirements. An embedded IoT kernel might compile 2,000 files. A distribution kernel might compile 10,000. The source tree supports them all.
What's Next
With this map in hand, you know where things live and why. In the next article, we'll follow the kernel from the very first instruction after power-on through the start_kernel() initialization sequence, watching as these subsystems come alive in a carefully choreographed boot sequence.