Read OSS

Navigating php-src: Architecture, Layers, and the Request Lifecycle

Intermediate

Prerequisites

  • Basic C programming (structs, pointers, function pointers)
  • General familiarity with how interpreted languages work
  • Understanding of process lifecycle concepts

Navigating php-src: Architecture, Layers, and the Request Lifecycle

PHP powers roughly 77% of all sites with a known server-side language. Yet most PHP developers never look at the engine beneath their code. The php-src repository is over two million lines of C — a number that intimidates even experienced systems programmers. This article gives you the mental map you need to navigate it. We'll break the codebase into its four architectural layers, examine the contract that lets PHP run inside Apache, Nginx, or a CLI terminal with identical behavior, and trace the complete lifecycle of a PHP request from the first main() call to the final shutdown.

By the end of this series, you'll be able to open any file in php-src and understand where it fits in the bigger picture.

Top-Level Directory Map

Before diving into architecture, let's orient ourselves with the repository's top-level layout. Each directory has a clear responsibility:

Directory Purpose
Zend/ The Zend Engine — lexer, parser, compiler, VM, memory allocator, GC, type system
main/ PHP runtime glue — lifecycle orchestration, INI system, streams, SAPI bridge
sapi/ Server API entry points — CLI, FPM, CGI, Apache module, Embed, phpdbg
ext/ 72+ bundled extensions — standard, JSON, OPcache, PDO, cURL, etc.
TSRM/ Thread Safe Resource Manager — thread-local storage abstraction
build/ Build tool scripts (autoconf, libtool helpers)
win32/ Windows-specific build configuration and compatibility layer
tests/ .phpt test files for the engine and extensions
Zend/Optimizer/ SSA-based optimizer (lives inside Zend/ but is a distinct subsystem)
graph TD
    subgraph "php-src repository"
        SAPI["sapi/<br/>CLI, FPM, CGI, Apache, Embed"]
        MAIN["main/<br/>Runtime, INI, Streams, SAPI bridge"]
        EXT["ext/<br/>72+ bundled extensions"]
        ZEND["Zend/<br/>Engine: lexer, parser, compiler, VM, GC"]
        OPT["Zend/Optimizer/<br/>SSA optimizer"]
        TSRM_DIR["TSRM/<br/>Thread safety"]
        BUILD["build/, win32/<br/>Build system"]
        TESTS["tests/<br/>.phpt test suite"]
    end

    SAPI --> MAIN
    MAIN --> ZEND
    EXT --> ZEND
    OPT --> ZEND
    ZEND --> TSRM_DIR

Tip: When you're hunting for a feature, start with ext/ for PHP-facing functions, Zend/ for language semantics, main/ for runtime behavior, and sapi/ for host-environment integration.

The Four Architectural Layers

php-src is organized into four stacked layers. Each layer depends only on the layers below it, and each has a distinct responsibility:

flowchart TB
    subgraph L4["Layer 4: SAPIs"]
        CLI["CLI"]
        FPM["FPM"]
        CGI["CGI"]
        APACHE["Apache"]
        EMBED["Embed"]
    end

    subgraph L3["Layer 3: PHP Runtime (main/)"]
        LIFECYCLE["Lifecycle Orchestration"]
        INI["INI System"]
        STREAMS["Streams I/O"]
        SAPI_BRIDGE["SAPI Bridge"]
    end

    subgraph L2["Layer 2: Zend Engine"]
        COMPILER["Compiler"]
        VM["Virtual Machine"]
        MM["Memory Manager"]
        GC["Garbage Collector"]
        TYPES["Type System"]
    end

    subgraph L1["Layer 1: TSRM"]
        TLS["Thread-Local Storage"]
    end

    L4 --> L3
    L3 --> L2
    L2 --> L1

Layer 1 — TSRM sits at the bottom, providing thread-safe access to global state. In the common non-ZTS (non–thread-safe) builds, TSRM is compiled out — global accessor macros like PG(), EG(), CG(), and SG() resolve directly to struct field access. In ZTS builds (used with event-driven SAPIs or Windows IIS), they go through thread-local storage lookups.

Layer 2 — The Zend Engine is the language core. It contains the lexer, parser, AST, compiler, virtual machine, memory allocator, garbage collector, and the fundamental type system (zval, HashTable, zend_string, zend_object). The engine knows nothing about HTTP, file I/O, or configuration files — it only knows how to compile and execute PHP opcodes.

Layer 3 — PHP Runtime (main/) bridges the engine to the outside world. It orchestrates the startup/shutdown lifecycle, parses INI files, manages the streams I/O abstraction, and provides the SAPI bridge that decouples the engine from its host environment.

Layer 4 — SAPIs are the entry points. Each SAPI implements a contract (a vtable of function pointers) that tells the runtime how to read input, write output, send headers, and log errors for a specific host environment.

The global state accessor macros deserve special attention. Each layer has its own globals struct, accessed through a dedicated macro:

Macro Struct Layer Contents
PG() php_core_globals Runtime INI settings, error handling, file upload state
EG() zend_executor_globals Engine Current execute_data, symbol tables, exception state
CG() zend_compiler_globals Engine Active op_array, AST, compilation state
SG() sapi_globals_struct Runtime Request info, headers, current SAPI module

These macros are everywhere in php-src. Recognizing them instantly is key to reading the code.

The SAPI Contract

The SAPI (Server API) contract is one of PHP's most elegant design decisions. It's a single struct — sapi_module_struct — containing roughly 30 function pointers that abstract every interaction between PHP and its host environment.

You can find the definition in main/SAPI.h. The key callbacks include:

classDiagram
    class sapi_module_struct {
        +char *name
        +char *pretty_name
        +startup(sapi_module_struct*) int
        +shutdown(sapi_module_struct*) int
        +activate() int
        +deactivate() int
        +ub_write(char*, size_t) size_t
        +flush(void*) void
        +header_handler(sapi_header_struct*, ...) int
        +send_headers(sapi_headers_struct*) int
        +send_header(sapi_header_struct*, void*) void
        +read_post(char*, size_t) size_t
        +read_cookies() char*
        +register_server_variables(zval*) void
        +log_message(char*, int) void
        +get_fd(int*) int
        +ini_defaults(HashTable*) void
    }

The CLI SAPI provides a concrete example. In sapi/cli/php_cli.c, the module definition wires up CLI-specific implementations:

  • ub_write → writes to stdout via fwrite()
  • read_post → returns nothing (CLI has no POST body)
  • read_cookies → returns NULL (no cookies in CLI)
  • register_server_variables → populates $_SERVER with argv, argc, and SCRIPT_FILENAME
  • log_message → writes to stderr

This contract means the Zend Engine never calls write() or fwrite() directly. It always goes through sapi_module.ub_write(), which does the right thing whether PHP is running as an Apache module, a FastCGI worker, or an embedded scripting engine.

SAPI Entry Points Compared

Each SAPI ships its own main() function, but they all converge on the same lifecycle calls. Here's how the major SAPIs differ:

SAPI Entry File Process Model Request Loop
CLI sapi/cli/php_cli.c Single process, single request Execute script and exit
FPM sapi/fpm/fpm/fpm_main.c Master + worker pool accept() loop in each worker
CGI sapi/cgi/cgi_main.c Spawned per-request by web server Single request, then exit
Apache sapi/apache2handler/sapi_apache2.c Loaded as .so module Called by Apache's request handler
Embed sapi/embed/php_embed.c Embedded in host application Host controls lifecycle

The CLI SAPI is the simplest: its main() parses command-line arguments, calls php_module_startup(), runs a single request, and shuts down. FPM is the most complex: it forks worker processes, manages pools with configurable sizing, and each worker loops through accept()php_request_startup() → execute → php_request_shutdown().

Despite these differences, every SAPI eventually calls the same four lifecycle functions from main/main.c. This is the convergence point.

The Request Lifecycle

The lifecycle is the backbone of PHP's execution model. Every PHP process — whether CLI, FPM, or Apache — follows the same four-phase pattern:

sequenceDiagram
    participant SAPI as SAPI main()
    participant Runtime as main/main.c
    participant Zend as Zend Engine
    participant Ext as Extensions

    Note over SAPI,Ext: Phase 1: Module Startup (once per process)
    SAPI->>Runtime: php_module_startup()
    Runtime->>Zend: zend_startup()
    Zend->>Zend: Init memory manager, scanner, compiler, VM
    Runtime->>Runtime: Parse php.ini
    Runtime->>Ext: Call each extension's MINIT()

    Note over SAPI,Ext: Phase 2: Request Startup (once per request)
    SAPI->>Runtime: php_request_startup()
    Runtime->>Zend: zend_activate()
    Zend->>Zend: Reset memory arena, init symbol tables
    Runtime->>Ext: Call each extension's RINIT()

    Note over SAPI,Ext: Phase 3: Execution
    SAPI->>Zend: zend_execute_scripts()
    Zend->>Zend: Compile source → opcodes
    Zend->>Zend: Execute opcodes in VM

    Note over SAPI,Ext: Phase 4: Request Shutdown
    SAPI->>Runtime: php_request_shutdown()
    Runtime->>Ext: Call each extension's RSHUTDOWN()
    Runtime->>Zend: zend_deactivate()
    Zend->>Zend: Free request memory, destroy symbol tables

    Note over SAPI,Ext: Phase 5: Module Shutdown (once per process)
    SAPI->>Runtime: php_module_shutdown()
    Runtime->>Ext: Call each extension's MSHUTDOWN()
    Runtime->>Zend: zend_shutdown()

Phase 1: Module Startup happens once when the process starts (or once when the Apache module loads). The key function is php_module_startup() in main/main.c. It calls zend_startup() to initialize the engine — memory manager, scanner, compiler, executor, and built-in functions. Then it parses php.ini, registers core INI settings, and walks the extension list calling each extension's MINIT (Module Init) hook. This is where extensions register their classes, constants, and internal functions.

Phase 2: Request Startup runs before each request. php_request_startup() in main/main.c calls zend_activate() to reset the per-request memory arena, re-initialize symbol tables, and clear the executor state. Then it calls each extension's RINIT (Request Init) hook — this is where extensions like session open the session store and opcache primes the optimizer.

Phase 3: Execution is where your PHP code actually runs. The SAPI calls zend_execute_scripts(), which compiles the source file to an op_array (or retrieves a cached one from OPcache) and feeds it to the VM.

Phase 4: Request Shutdown mirrors startup. php_request_shutdown() calls each extension's RSHUTDOWN, then zend_deactivate() destroys all per-request data. In FPM and Apache, the process loops back to Phase 2 for the next request.

Phase 5: Module Shutdown runs when the process exits. Extensions get their MSHUTDOWN call, and zend_shutdown() tears down the engine.

Tip: The clean separation between module startup (once) and request startup (per-request) is why PHP's "shared nothing" architecture works so well. Each request starts with a clean slate — no leaked state from previous requests. This is also why PHP never needs to be "restarted" to pick up code changes (unless OPcache is caching).

Configuration: The INI System

PHP's configuration system is intimately tied to the lifecycle. INI files are parsed during module startup, and the change-mode system controls which settings can be modified at which lifecycle phase.

flowchart TD
    A["Process Start"] --> B["Scan for php.ini"]
    B --> C["Parse php.ini directives"]
    C --> D["Apply PHP_INI_SYSTEM settings"]
    D --> E["Extensions register INI entries in MINIT"]
    E --> F["Per-request: scan .user.ini"]
    F --> G["Apply PHP_INI_PERDIR settings"]
    G --> H["Runtime: ini_set() calls"]
    H --> I["Apply PHP_INI_USER / PHP_INI_ALL settings"]

Every INI directive has a change mode that determines when it can be modified:

Mode Constant Where it can be set
PHP_INI_SYSTEM 4 php.ini only — requires process restart
PHP_INI_PERDIR 6 php.ini, .user.ini, or httpd.conf
PHP_INI_USER 7 All of the above + ini_set() at runtime
PHP_INI_ALL 7 Same as USER — settable anywhere

The INI entries are defined in Zend/zend_ini.h and registered by each extension during MINIT using macros like STD_PHP_INI_ENTRY. The actual parsing happens inside php_module_startup(), where php_init_config() locates and parses the INI file.

The .user.ini feature (controlled by user_ini.filename) allows per-directory overrides in non-CLI SAPIs. These are scanned during request startup with a configurable cache TTL (user_ini.cache_ttl), so they don't impose per-request file system overhead.

What's Next

We now have the map. We know the four layers, the SAPI contract, and the lifecycle that governs every PHP request. In the next article, we'll zoom into the Zend Engine's core data structures — the 16-byte zval that represents every PHP value, the dual-mode HashTable that powers PHP arrays, and the memory allocator that makes PHP's "allocate everything, free it all at once" model remarkably fast. Understanding these structures is essential for reading any part of the engine code.