Read OSS

The Zend Virtual Machine: Execution, Code Generation, and Optimization

Advanced

Prerequisites

  • Articles 1-3: Full understanding of architecture, data structures, and compilation
  • Understanding of CPU dispatch mechanisms (function pointers, computed goto)
  • Familiarity with SSA form and compiler optimization passes

The Zend Virtual Machine: Execution, Code Generation, and Optimization

Articles 1–3 gave us the full picture of how PHP source becomes an op_array of zend_op instructions. Now we arrive at the component that runs them: the Zend Virtual Machine. The VM is where PHP spends nearly all its execution time, so its design is hyper-optimized for dispatch throughput.

What makes the Zend VM unique among language runtimes is its template-based code generation system. Rather than writing type-specialized handlers by hand, a PHP script reads handler templates and expands them into thousands of variants — producing a 123,000-line generated file that eliminates runtime type dispatch from the hot path. This article takes you through that system, the five dispatch modes, the call frame layout, and the SSA-based optimizer that transforms opcodes before execution.

VM Code Generation System

The VM uses a three-file architecture that's unlike anything in other interpreters:

  1. Zend/zend_vm_def.h — ~204 handler templates with type placeholders
  2. Zend/zend_vm_gen.php — a PHP script that reads the templates and generates specialized variants
  3. Zend/zend_vm_execute.h — the ~123,000-line generated output, included by the executor
flowchart LR
    DEF["zend_vm_def.h<br/>204 handler templates<br/>with OP1_TYPE, OP2_TYPE placeholders"]
    GEN["zend_vm_gen.php<br/>Template expander<br/>Type specialization engine"]
    EXEC["zend_vm_execute.h<br/>~123,000 lines<br/>Thousands of specialized handlers"]
    OPCODES["zend_vm_opcodes.h<br/>Opcode → handler mapping<br/>Dispatch tables"]
    
    DEF --> GEN
    GEN --> EXEC
    GEN --> OPCODES

The key insight is type specialization. Consider an addition operation: $a + $b. At runtime, $a and $b could each be IS_CONST, IS_CV, IS_TMP_VAR, or IS_VAR. That's up to 4×4 = 16 combinations. The ZEND_ADD template in zend_vm_def.h uses placeholders like OP1_TYPE and OP2_TYPE. The generator expands this into separate functions: ZEND_ADD_SPEC_CONST_CONST, ZEND_ADD_SPEC_CV_CV, ZEND_ADD_SPEC_CV_CONST, etc.

Each specialized handler knows its operand types at compile time. Instead of a runtime switch on op1_type, the CONST variant can directly index into the literals table, and the CV variant can directly access the compiled variable slot. This eliminates one or two branches per operand from the hot path.

The generator script also produces the opcode-to-handler mapping tables in zend_vm_opcodes.h. During compilation, when the compiler emits a zend_op, it looks up the correct specialized handler based on the operand types and stores the function pointer directly in the op's handler field.

Five Dispatch Modes

The generated VM supports five dispatch strategies, selected at C compile time. The mode is determined by constants in Zend/zend_vm_opcodes.h:

Mode Mechanism When Used
CALL handler(execute_data) — indirect function call Fallback / portable
SWITCH switch(opcode) { case ... } Debug builds
GOTO GCC computed goto (goto *handler) GCC/Clang with labels-as-values
HYBRID Mix of computed goto + function calls Default on GCC/Clang
TAILCALL Clang musttail + preserve_none Newest, Clang 19+ only
flowchart TD
    START["Execute next opcode"] --> MODE{"Dispatch mode?"}
    MODE -->|"CALL"| CALL["opline->handler(execute_data)<br/>Indirect function call<br/>CPU: predict call target"]
    MODE -->|"SWITCH"| SW["switch(opline->opcode)<br/>Jump table<br/>CPU: predict branch"]
    MODE -->|"GOTO"| GOTO["goto *opline->handler<br/>Computed goto<br/>CPU: no prediction needed"]
    MODE -->|"HYBRID"| HY["Hot path: computed goto<br/>Cold path: function call<br/>Best of both worlds"]
    MODE -->|"TAILCALL"| TC["musttail return handler()<br/>preserve_none convention<br/>Near-zero call overhead"]
    
    CALL --> NEXT["Advance opline, repeat"]
    SW --> NEXT
    GOTO --> NEXT
    HY --> NEXT
    TC --> NEXT

HYBRID mode (the default) is the most interesting. Hot handlers — those that execute frequently — use computed goto dispatch, avoiding the overhead of function call/return. Cold handlers are regular functions called from the dispatch loop. This keeps the hot path's instruction cache footprint small while allowing cold handlers to be large without polluting the cache.

TAILCALL mode is the newest addition, requiring Clang 19+. It uses the musttail attribute to guarantee tail call optimization and the preserve_none calling convention to minimize register save/restore overhead. Each handler tail-calls the next handler, effectively eliminating the dispatch loop entirely.

Tip: You can check which VM mode your PHP build uses with php -i | grep "Virtual Machine" — though this isn't always exposed. Looking at the compile flags (ZEND_VM_KIND) in the build output is more reliable.

Handler Template Anatomy

Let's examine the ZEND_ADD handler to understand the template system concretely. In Zend/zend_vm_def.h:

The handler follows a consistent pattern:

  1. Fetch operands: GET_OP1_ZVAL_PTR / GET_OP2_ZVAL_PTR — macros that expand differently based on the specialized type. For IS_CV, this is a direct pointer into the CV table. For IS_CONST, it indexes the literals array.

  2. Fast path: Check if both operands are IS_LONG. If so, perform integer addition directly. If the result overflows, fall through to the double path. This fast path is the most common case and avoids all function call overhead.

  3. Medium path: Check if one or both are IS_DOUBLE. Perform floating-point addition.

  4. Slow path: Call a general-purpose function that handles type coercion, object operator overloading, and error cases.

  5. Store result: Write to the result zval slot and advance opline to the next instruction.

flowchart TD
    FETCH["Fetch op1, op2<br/>(type-specialized)"] --> FAST{"Both IS_LONG?"}
    FAST -->|"Yes"| IADD["Integer add<br/>Check overflow"]
    IADD --> OVF{"Overflow?"}
    OVF -->|"No"| STORE["Store IS_LONG result"]
    OVF -->|"Yes"| DFLOAT["Convert to IS_DOUBLE"]
    
    FAST -->|"No"| DBL{"Either IS_DOUBLE?"}
    DBL -->|"Yes"| DADD["Double add"]
    DADD --> DSTORE["Store IS_DOUBLE result"]
    
    DBL -->|"No"| SLOW["Slow path:<br/>type coercion,<br/>operator overloading"]
    
    STORE --> NEXT["ZEND_VM_NEXT_OPCODE()"]
    DSTORE --> NEXT
    DFLOAT --> DSTORE
    SLOW --> NEXT

The macro ZEND_VM_NEXT_OPCODE() advances opline to the next instruction and dispatches to its handler. In GOTO mode, this is goto *(++opline)->handler. In CALL mode, it's return from the current handler (the dispatch loop calls the next handler). In HYBRID mode, it uses a label for hot handlers.

Global Register Pinning

On x86_64 with GCC or Clang, the VM pins two critical values to CPU registers, as defined in Zend/zend_execute.c:

  • execute_data → pinned to %r14 (or %r14 equivalent)
  • opline → pinned to %r15

These are the two values accessed on every single opcode dispatch. Pinning them to registers eliminates memory loads from the hot loop — the CPU always has the current frame pointer and instruction pointer ready.

The EXECUTE_DATA_D and OPLINE_D macros expand to register variable declarations when pinning is available, and to regular local variables otherwise. This is a significant performance win: benchmarks show 5–15% improvement from register pinning alone.

This technique works because GCC and Clang support the register ... asm("r14") extension. On architectures where global register variables aren't supported (or when the compiler can't guarantee the registers are preserved across function calls), the macros fall back to stack variables.

Call Frame Layout

When PHP calls a function, the VM doesn't use the C call stack. Instead, it allocates a zend_execute_data frame on a custom VM stack. This frame layout is defined in Zend/zend_compile.h:

flowchart TB
    subgraph frame["zend_execute_data frame on VM stack"]
        direction TB
        HEADER["zend_execute_data header<br/>opline, func, This, prev_execute_data<br/>return_value, run_time_cache"]
        CV["CV slots (Compiled Variables)<br/>[0]: $this (if method)<br/>[1]: $param1<br/>[2]: $param2<br/>[3]: $localVar<br/>..."]
        TMP["TMP_VAR / VAR slots<br/>(expression temporaries)"]
        EXTRA["Extra args<br/>(variadic overflow)"]
    end
    
    CALLER["Caller's frame<br/>(prev_execute_data)"] --> HEADER
    HEADER --> CV
    CV --> TMP
    TMP --> EXTRA

The zend_execute_data struct contains:

  • opline: current instruction pointer (pinned to register in the fast path)
  • func: pointer to the zend_function being executed
  • This: the $this object for method calls (or a special internal value for functions)
  • prev_execute_data: link to the caller's frame
  • return_value: pointer to where the return value should be stored (the caller's result slot)

Immediately after the header come the CV slots — one zval per compiled variable, in declaration order. The compiler assigns each $variable a numeric index, and the VM accesses them as EX_VAR(offset) — a simple pointer offset from execute_data.

After the CVs come TMP_VAR and VAR slots for expression temporaries. These are allocated by the compiler during the compilation pass and sized to the maximum simultaneous temporaries needed.

Function arguments are passed by pre-initializing the callee's CV slots before switching frames. The calling convention is: allocate the callee frame, copy arguments into its CV[0], CV[1], ..., then switch execute_data to the new frame.

Hookable Function Pointers

One of php-src's most important extensibility patterns is the use of global function pointers that can be replaced at runtime. These are set during engine startup in Zend/zend.c:

sequenceDiagram
    participant Engine as Zend Engine
    participant OPcache as OPcache Extension
    participant Profiler as Xdebug/APM

    Note over Engine: zend_startup() sets defaults
    Engine->>Engine: zend_compile_file = compile_file
    Engine->>Engine: zend_execute_ex = execute_ex
    Engine->>Engine: zend_execute_internal = NULL

    Note over Engine,OPcache: During MINIT
    OPcache->>Engine: Save original zend_compile_file
    OPcache->>Engine: zend_compile_file = persistent_compile_file
    
    Profiler->>Engine: Save original zend_execute_ex
    Profiler->>Engine: zend_execute_ex = profiler_execute_ex

    Note over Engine: Runtime compilation
    Engine->>OPcache: zend_compile_file("script.php")
    OPcache->>OPcache: Check shared memory cache
    alt Cache hit
        OPcache-->>Engine: Return cached op_array
    else Cache miss
        OPcache->>Engine: Call original compile_file()
        OPcache->>OPcache: Store in shared memory
        OPcache-->>Engine: Return op_array
    end

The three key hookable pointers are:

  • zend_compile_file: Called to compile a PHP file. OPcache replaces this to intercept compilation and return cached op_arrays.
  • zend_execute_ex: Called to execute a user function's opcodes. Debuggers (Xdebug) and profilers replace this to instrument function entry/exit.
  • zend_execute_internal: Called to execute an internal (C) function. APM tools can hook this to monitor built-in function calls.

Extensions save the original pointer and chain their replacement to call it when needed. This creates a middleware-like chain: OPcache's compile hook → check cache → on miss, call original compiler → store result.

Tip: If you're writing a PHP extension that needs to intercept execution, prefer the Observer API (described next) over replacing zend_execute_ex. The Observer API is designed for safe coexistence with other extensions, while global function pointer replacement can conflict.

The Observer API

The Observer API, defined in Zend/zend_observer.h and implemented in Zend/zend_observer.c, provides a structured way to instrument function calls without replacing global function pointers.

Extensions register observer handlers that are called on function begin and end:

  • zend_observer_fcall_register: Registers a callback that is invoked for every function call. The callback can provide a begin handler and an end handler.
  • The begin handler receives execute_data at function entry.
  • The end handler receives execute_data and the return value at function exit.

Multiple observers can coexist — the engine maintains an array of registered handlers and calls them all. The handlers are stored in the per-function runtime cache, so the lookup cost is paid only once per function per request.

The Observer API also supports fiber switch notifications (zend_observer_fiber_switch_register) and error notifications, making it the preferred hook point for APM tools, profilers, and code coverage tools.

The SSA-Based Optimizer

When OPcache is enabled, compiled op_arrays go through a multi-pass optimization pipeline before execution. The optimizer lives in Zend/Optimizer/ and is orchestrated by Zend/Optimizer/zend_optimizer.c:

flowchart TD
    INPUT["zend_op_array<br/>(unoptimized)"] --> P1["Pass 1: Constant Folding<br/>Evaluate constant expressions"]
    P1 --> CFG["CFG Construction<br/>(zend_cfg.c)<br/>Build control flow graph"]
    CFG --> SSA["SSA Construction<br/>(zend_ssa.c)<br/>Insert phi nodes, rename vars"]
    SSA --> TI["Type Inference<br/>(zend_inference.c)<br/>Propagate types through SSA"]
    TI --> SCCP["SCCP Pass<br/>(sccp.c)<br/>Sparse Conditional Constant Propagation"]
    SCCP --> DCE["DCE Pass<br/>(dce.c)<br/>Dead Code Elimination"]
    DCE --> DFA["DFA Pass<br/>(dfa_pass.c)<br/>Data-flow optimizations"]
    DFA --> BLOCK["Block Pass<br/>(block_pass.c)<br/>Peephole, jump threading"]
    BLOCK --> OUTPUT["Optimized zend_op_array"]

The SSA data structures are defined in Zend/Optimizer/zend_ssa.h. Each SSA variable has a definition point, use chain, and inferred type information. Phi nodes are inserted at control flow merge points.

Type inference (zend_inference.c) is particularly important because its results feed the JIT compiler. By knowing that a variable is always IS_LONG at a particular point, the JIT can emit integer-only machine code without type checks.

SCCP (Sparse Conditional Constant Propagation) in Zend/Optimizer/sccp.c combines constant propagation with unreachable code detection. If a branch condition is a known constant, the false branch is eliminated.

DCE (Dead Code Elimination) in Zend/Optimizer/dce.c removes instructions whose results are never used. This is surprisingly effective after SCCP has propagated constants and simplified expressions.

The optimizer's pass level is controlled by the opcache.optimization_level INI setting, a bitmask where each bit enables a specific pass. The default enables all passes.

What's Next

We've now covered the complete execution pipeline from VM dispatch through optimization. In Article 5 — the final installment — we'll explore the extension ecosystem that makes PHP practical: the extension API with its lifecycle hooks, OPcache's shared memory architecture, the JIT compiler that translates hot opcodes to native machine code, Fibers for cooperative concurrency, the streams I/O abstraction, and TSRM for thread safety. These are the systems that turn the Zend Engine into the PHP runtime that powers the web.