The Extension System, OPcache, and JIT: How PHP is Extended and Optimized
Prerequisites
- ›Articles 1-4: Complete engine understanding (architecture, types, compilation, VM)
- ›Understanding of shared memory and inter-process communication
- ›Basic knowledge of JIT compilation concepts
The Extension System, OPcache, and JIT: How PHP is Extended and Optimized
Over the previous four articles, we've traced PHP from its SAPI entry points through the type system, the compilation pipeline, and the virtual machine. But the Zend Engine alone isn't enough to build a web application. The extension system is what transforms a language engine into a complete runtime — connecting PHP to databases, HTTP, JSON, cryptography, file systems, and everything else developers need.
In this final article, we'll examine the extension API that 72+ bundled extensions use, then focus on the three most architecturally significant subsystems: OPcache (which keeps PHP fast by caching compiled code in shared memory), the JIT compiler (which takes it further by generating native machine code), and Fibers (which bring cooperative concurrency to PHP). We'll close with the streams I/O abstraction and TSRM thread safety layer.
The Extension API
Every PHP extension — whether bundled in ext/ or installed via PECL — registers itself with a zend_module_entry struct defined in Zend/zend_modules.h:
classDiagram
class zend_module_entry {
+char *name
+zend_function_entry *functions
+module_startup_func MINIT
+module_shutdown_func MSHUTDOWN
+request_startup_func RINIT
+request_shutdown_func RSHUTDOWN
+info_func MINFO
+char *version
+globals_size
+globals_ctor GINIT
+globals_dtor GDTOR
+post_deactivate_func
+deps: zend_module_dep*
}
The JSON extension in ext/json/json.c is an excellent minimal example. Its module entry registers a MINIT function, a function table, and an MINFO function for phpinfo() output. It's simple enough to read in five minutes, yet it demonstrates the complete registration pattern.
The functions field points to an array of zend_function_entry structs, each mapping a PHP function name to a C handler function and argument info. Since PHP 8.0, argument info is generated from .stub.php files — PHP-syntax function declarations that a build tool converts to _arginfo.h headers. For example, ext/json/json_arginfo.h is generated from ext/json/json.stub.php.
Tip: If you're writing a new extension, start by copying a simple one like
ext/jsonand modifying it. The stub file system (*.stub.php→*_arginfo.h) eliminates the most error-prone part of extension development: manually writing argument info structs.
Extension Lifecycle Hooks
As we saw in Article 1, the PHP lifecycle has module-level and request-level phases. Extensions hook into these phases through their zend_module_entry callbacks:
flowchart TD
START["Process Start"] --> GINIT["GINIT<br/>Initialize extension globals struct<br/>(called once per thread in ZTS)"]
GINIT --> MINIT["MINIT<br/>Module initialization:<br/>register classes, constants,<br/>INI entries, resources"]
MINIT --> LOOP["Request Loop"]
LOOP --> RINIT["RINIT<br/>Per-request setup:<br/>reset counters, open connections"]
RINIT --> EXEC["Script Execution"]
EXEC --> RSHUTDOWN["RSHUTDOWN<br/>Per-request teardown:<br/>close connections, flush buffers"]
RSHUTDOWN --> POST["post_deactivate_func<br/>Late cleanup after output sent"]
POST --> LOOP
LOOP -->|"Process exit"| MSHUTDOWN["MSHUTDOWN<br/>Module teardown:<br/>unregister, free persistent memory"]
MSHUTDOWN --> GDTOR["GDTOR<br/>Destroy extension globals struct"]
GDTOR --> END["Process End"]
The separation matters for correctness:
- GINIT/GDTOR: Initialize and destroy the extension's globals struct. In ZTS builds, this runs once per thread. The globals struct holds per-module state (not per-request state).
- MINIT: Register everything that persists across requests — classes, functions, constants, INI entries. Use
pemalloc()(persistent allocation) here, notemalloc(). - RINIT: Set up per-request state. This is where the session extension opens the session file, for example.
- RSHUTDOWN: Clean up per-request state. Called before output is flushed.
- MSHUTDOWN: Clean up module-level resources. Free persistent allocations, unregister handlers.
Registering Internal Functions
Every internal (C-implemented) PHP function receives its arguments through the INTERNAL_FUNCTION_PARAMETERS macro, defined in Zend/zend.h:
#define INTERNAL_FUNCTION_PARAMETERS zend_execute_data *execute_data, zval *return_value
Every internal function is void fn_name(zend_execute_data *execute_data, zval *return_value). The function parses its arguments from execute_data using zend_parse_parameters() or the faster ZEND_PARSE_PARAMETERS_* fast-path macros, and writes its return value into return_value.
flowchart LR
STUB["json.stub.php<br/>PHP-syntax declarations:<br/>function json_encode(mixed $value, int $flags = 0): string|false"]
GEN["gen_stub.php<br/>Build-time generator"]
ARGINFO["json_arginfo.h<br/>Generated zend_function_entry[]<br/>+ ZEND_ARG_INFO structs"]
IMPL["json.c<br/>PHP_FUNCTION(json_encode)<br/>{ ... C implementation ... }"]
STUB --> GEN
GEN --> ARGINFO
ARGINFO --> |"Linked at compile"| IMPL
At the zend_function union level (as we saw in Article 3), internal functions use zend_internal_function instead of zend_op_array. The common header is identical, so the VM can handle both uniformly — but when it dispatches an internal function, it calls the C handler directly via zend_execute_internal rather than entering the opcode interpreter.
OPcache: Shared Memory Opcode Cache
OPcache is PHP's most important performance extension. Without it, every request recompiles every PHP file from source. With it, compiled op_arrays are stored in shared memory and reused across all worker processes.
The core of OPcache lives in ext/opcache/ZendAccelerator.c. During MINIT, it replaces the zend_compile_file function pointer (the hookable pointer pattern from Article 4) with its own persistent_compile_file. The interception logic:
- Request comes in, PHP calls
zend_compile_file("script.php") - OPcache's replacement function checks: is this file in the shared memory cache?
- Cache hit: Return a pointer to the cached
zend_op_array. No compilation at all. - Cache miss: Call the original
compile_file, then persist the result to shared memory.
flowchart TD
subgraph shm["Shared Memory (SHM)"]
direction TB
ALLOC["SHM Allocator<br/>(mmap / shm / posix)"]
STRINGS["Interned Strings Table<br/>(shared across processes)"]
CACHE["Opcode Cache<br/>filename → cached_script"]
CACHED["Cached Script:<br/>op_array, class_table,<br/>function_table"]
end
subgraph workers["FPM Worker Processes"]
W1["Worker 1"]
W2["Worker 2"]
W3["Worker 3"]
end
W1 -->|"Read-only access"| CACHE
W2 -->|"Read-only access"| CACHE
W3 -->|"Read-only access"| CACHE
W1 -->|"First compile"| CACHED
The persistence layer in ext/opcache/zend_persist.c is the tricky part. Op_arrays contain pointers — to strings, literals, class entries, other op_arrays. When copying to shared memory, all these pointers must be adjusted to point to the shared memory versions. Strings are interned into the shared interned string table. Nested structures (function tables within class entries) are recursively persisted.
Shared memory allocation is handled by ext/opcache/zend_shared_alloc.c, which supports multiple backends: mmap (the default on Linux), shm (System V shared memory), and posix (POSIX shared memory). The allocated region is sized by opcache.memory_consumption (default: 128MB).
OPcache also supports preloading (opcache.preload): a PHP script that runs once at server startup and loads classes/functions into shared memory permanently. Preloaded code is never invalidated and never recompiled, providing the fastest possible access.
The JIT Compiler
PHP 8.0 introduced a JIT (Just-In-Time) compiler that translates hot opcodes to native machine code. The JIT sits inside OPcache and builds on the SSA-based optimizer's type inference results (from Article 4).
The JIT entry point is ext/opcache/jit/zend_jit.c. It operates in two modes:
Function JIT compiles entire functions to native code. When a function's execution count exceeds a threshold, the JIT compiles it and patches the op_array's handler pointers to jump directly to native code.
Tracing JIT (in ext/opcache/jit/zend_jit_trace.c) is more sophisticated. It records execution traces — linear sequences of opcodes through hot loops — and compiles those traces to native code. Traces can span function boundaries, capturing the actual hot path through inlined function calls.
flowchart TD
OPCODE["Opcode execution"] --> COUNT{"Execution count<br/>> threshold?"}
COUNT -->|"No"| INTERP["Continue interpreting"]
COUNT -->|"Yes"| MODE{"JIT mode?"}
MODE -->|"Function JIT"| FJIT["Compile entire function<br/>to native code"]
MODE -->|"Tracing JIT"| RECORD["Record trace<br/>(linear opcode path)"]
RECORD --> TRACE_END{"Loop back or<br/>return?"}
TRACE_END -->|"Loop"| COMPILE["Compile trace"]
TRACE_END -->|"Side exit"| LINK["Link to other traces<br/>or back to interpreter"]
FJIT --> PATCH["Patch handler<br/>to native entry point"]
COMPILE --> PATCH
PATCH --> NATIVE["Execute native code<br/>Type guards inline"]
NATIVE --> DEOPT{"Type guard<br/>fails?"}
DEOPT -->|"Yes"| INTERP
DEOPT -->|"No"| NATIVE
The native code is generated using the IR framework in ext/opcache/jit/ir/. IR (Intermediate Representation) is a custom compiler framework that provides SSA-based IR construction, optimization passes (constant folding, copy propagation, register allocation), and code emission for x86_64 and AArch64.
The JIT IR pipeline in ext/opcache/jit/zend_jit_ir.c translates Zend opcodes to IR instructions. Type guards are inserted based on the SSA type inference from the optimizer — if a variable was inferred as IS_LONG, the JIT emits a type check at trace entry and integer-only arithmetic in the trace body. If the type guard fails at runtime, the trace deoptimizes back to the interpreter.
JIT configuration is controlled by opcache.jit (a 4-digit bitmask) and opcache.jit_buffer_size. The default in PHP 8.4 enables the tracing JIT.
Tip: The JIT provides the biggest speedups for CPU-bound code (mathematical computation, data processing). For typical I/O-bound web applications, OPcache alone provides most of the performance benefit. Profile before enabling the JIT — it uses additional memory and can increase startup time.
Fibers: Cooperative Concurrency
PHP 8.1 introduced Fibers for cooperative multitasking. A Fiber is a lightweight execution context with its own call stack that can be suspended and resumed. The implementation is in Zend/zend_fibers.h and Zend/zend_fibers.c.
sequenceDiagram
participant Main as Main Context
participant Fiber as Fiber Context
Main->>Fiber: $fiber->start()
Note over Fiber: Execute callback<br/>on Fiber's own C stack
Fiber->>Main: Fiber::suspend($value)
Note over Main: Context switch back<br/>$fiber->start() returns $value
Main->>Fiber: $fiber->resume($sent)
Note over Fiber: Fiber::suspend() returns $sent
Fiber->>Main: return $result
Note over Main: $fiber->getReturn() → $result
Each Fiber has its own C stack (allocated by zend_fiber_stack_allocate, typically 512KB using mmap with guard pages). The context switch saves and restores CPU registers and switches stack pointers using platform-specific assembly (boost.context-derived code for x86_64, ARM64, etc.).
Fibers integrate with the VM's execute_data chain. When a Fiber suspends, its execute_data chain is preserved — all the VM frames, CV slots, and temporaries remain on the Fiber's stack. When it resumes, the VM continues exactly where it left off.
The key design choice is that Fibers are cooperative, not preemptive. A Fiber runs until it explicitly calls Fiber::suspend(). This eliminates the need for locks or atomic operations — only one Fiber executes at a time. Libraries like ReactPHP and Revolt use Fibers under the hood to provide async/await patterns for I/O-bound operations.
Streams: Unified I/O Abstraction
PHP's streams layer, rooted in main/streams/streams.c, provides a unified interface for all I/O operations. When you call fopen(), file_get_contents(), or fread(), you're going through the streams layer.
classDiagram
class php_stream {
+ops: php_stream_ops*
+readbuf: char*
+readbuflen: size_t
+wrapper: php_stream_wrapper*
+context: php_stream_context*
}
class php_stream_ops {
+write(stream, buf, count) ssize_t
+read(stream, buf, count) ssize_t
+close(stream) int
+flush(stream) int
+seek(stream, offset, whence) int
+cast(stream, castas, ret) int
+stat(stream, ssb) int
+label: char*
}
class php_stream_wrapper {
+wops: php_stream_wrapper_ops*
+abstract: void*
+is_url: int
}
class php_stream_wrapper_ops {
+stream_opener()
+stream_closer()
+url_stat()
+dir_opener()
+unlink()
+rename()
+stream_mkdir()
+label: char*
}
php_stream --> php_stream_ops : ops
php_stream --> php_stream_wrapper : wrapper
php_stream_wrapper --> php_stream_wrapper_ops : wops
The wrapper system is what makes fopen("http://example.com/file.txt") work. Each URL scheme is handled by a registered wrapper:
| Wrapper | Scheme | Implementation |
|---|---|---|
| Plain file | file:// |
main/streams/plain_wrapper.c |
| HTTP | http://, https:// |
ext/standard/http_fopen_wrapper.c |
| FTP | ftp:// |
ext/standard/ftp_fopen_wrapper.c |
| PHP | php://stdin, php://memory, etc. |
ext/standard/php_fopen_wrapper.c |
| Compression | compress.zlib:// |
ext/zlib/zlib_fopen_wrapper.c |
| User-space | Custom schemes | Registered via stream_wrapper_register() |
The transport layer (main/streams/xp_socket.c) handles socket I/O — TCP, UDP, Unix domain sockets, and SSL/TLS connections. The stream_socket_client() and stream_socket_server() functions go through this layer.
TSRM: Thread Safety
The Thread Safe Resource Manager in TSRM/TSRM.h and TSRM/TSRM.c solves a fundamental problem: PHP's engine uses extensive global state (compiler globals, executor globals, SAPI globals, per-extension globals), but in ZTS (Zend Thread Safety) builds, multiple threads may execute PHP simultaneously.
TSRM provides thread-local storage for all global state. Each module allocates a "resource ID" via ts_allocate_id(), which returns an integer index. At runtime, the global accessor macros resolve through TSRM:
| Build | EG(current_execute_data) expands to |
|---|---|
| Non-ZTS | executor_globals.current_execute_data — direct struct access |
| ZTS | ((zend_executor_globals*)tsrm_get_ls_cache())->current_execute_data — thread-local lookup |
In non-ZTS builds (the common case for FPM), TSRM is compiled out entirely. The macros become direct struct access with zero overhead. This is why PHP distributions ship separate ZTS and non-ZTS builds — the non-ZTS build is measurably faster.
ZTS builds are required for:
- Windows IIS with multiple threads serving PHP
- Event-driven SAPIs that use threading
- The
parallelextension for true multithreading
Most production PHP deployments use non-ZTS FPM, where process-level isolation provides "thread safety" by giving each worker its own address space.
Series Conclusion
Across these five articles, we've traced PHP from its top-level directory structure through every major subsystem:
- Architecture and Lifecycle — the four layers, the SAPI contract, the request lifecycle
- The zval and Memory Model — 16-byte value representation, reference counting, COW, the allocator, and GC
- The Compilation Pipeline — lexer, parser, AST, compiler, opcodes, and the flag system
- The Virtual Machine — code generation, dispatch modes, register pinning, the optimizer
- Extensions, OPcache, and JIT — the extension API, shared memory caching, native code generation, Fibers, streams, and TSRM
The php-src codebase is large, but it's well-structured. The architectural boundaries are clear, the naming conventions are consistent, and the design patterns (hookable function pointers, lifecycle hooks, vtables of function pointers) repeat throughout. Once you understand the patterns, navigating even unfamiliar corners of the codebase becomes straightforward.
Tip: The best way to deepen your understanding is to pick a PHP function you use daily — like
array_map()orjson_decode()— and read its implementation end-to-end. You now have the context to understand every line.