Read OSS

Boot Sequence Deep Dive: From docker compose up to a Syncing Node

Intermediate

Prerequisites

  • Article 1: Architecture and Codebase Navigation
  • What the Engine API is (consensus ↔ execution communication protocol)
  • Familiarity with JWT authentication basics

Boot Sequence Deep Dive: From Docker compose up to a Syncing Node

In Part 1 we mapped the codebase and understood that base/node orchestrates two services connected via the Engine API. But the architecture diagram is static — what actually happens when you type docker compose up? The answer involves a carefully choreographed dance across multiple shell scripts, where ordering constraints are enforced not by a framework but by curl loops and process lifecycle management.

This article traces the entire boot sequence. We'll follow the execution path from Docker Compose's service startup, through the consensus entrypoint dispatcher, into each client's initialization logic — culminating in Reth's remarkable multi-stage startup that can pause for up to six hours waiting for the static file manager to initialize.

Docker Compose Startup Order

When you run docker compose up, Docker reads docker-compose.yml and determines the startup order from the depends_on graph. The node service declares a dependency on execution:

node:
    depends_on:
      - execution

Docker starts the execution container first, then the node container. But depends_on without a health check condition only waits for the container to be created — not for the application inside to be ready. The execution client might take seconds or minutes to bind its Engine API port. The actual readiness synchronization is handled entirely by the consensus entrypoint scripts.

sequenceDiagram
    participant User
    participant DC as Docker Compose
    participant EX as Execution Container
    participant ND as Node Container

    User->>DC: docker compose up
    DC->>EX: Start container
    DC->>EX: Run: bash ./execution-entrypoint
    Note over EX: Execution client initializing...
    DC->>ND: Start container (depends_on satisfied)
    DC->>ND: Run: bash ./consensus-entrypoint
    Note over ND: Polls Engine API with curl
    ND-->>EX: curl http://execution:8551 → connection refused
    Note over ND: sleep 5, retry...
    Note over EX: Engine API ready on :8551
    ND-->>EX: curl http://execution:8551 → HTTP 401
    Note over ND: 401 = authenticated endpoint exists!
    ND->>ND: Write JWT secret, exec consensus binary
    ND-->>EX: Engine API with JWT auth
    Note over EX,ND: Both services syncing

The command overrides in docker-compose.yml are what distinguish the two containers despite sharing the same image. The execution service runs bash ./execution-entrypoint (line 13), and the node service runs bash ./consensus-entrypoint (line 33). Both files exist in the Docker image because every Dockerfile COPYs all entrypoint scripts into /app.

The Consensus Entrypoint Dispatcher

The consensus-entrypoint is a simple but important dispatcher. It routes to one of two consensus clients based on a single environment variable:

if [ "${USE_BASE_CONSENSUS:-false}" = "true" ]; then
    if [ -f ./base-consensus-entrypoint ]; then
        echo "Using Base Client"
        exec ./base-consensus-entrypoint
    else
        echo "Base client is not supported for this node type"
        exit 1
    fi
else
    echo "Using OP Node"
    exec ./op-node-entrypoint
fi
flowchart TD
    CE[consensus-entrypoint] -->|"USE_BASE_CONSENSUS=true"| CHECK{"base-consensus-entrypoint<br/>exists?"}
    CE -->|"USE_BASE_CONSENSUS=false"| OP[op-node-entrypoint]
    CHECK -->|Yes| BC[base-consensus-entrypoint]
    CHECK -->|No| FAIL["Exit 1:<br/>not supported for this node type"]

The file-existence check on line 5 is not just defensive coding — it's architecturally significant. As we'll explore in Part 3, only the Reth Dockerfile bundles the base-consensus binary and its entrypoint. If you set CLIENT=geth and USE_BASE_CONSENSUS=true, the Geth image won't have base-consensus-entrypoint, and the dispatcher will fail gracefully with an error message rather than a cryptic "file not found."

Note the use of exec — the shell process is replaced by the target script. This means the consensus entrypoint's PID becomes PID 1 (or inherits the parent PID), which is important for signal handling and Docker's stop behavior.

Consensus Client Startup: Waiting for the Engine API

Both consensus entrypoints follow the same three-phase pattern: validate environment, wait for Engine API, then exec the binary. Let's compare them side by side.

Phase 1: Environment Validation

The base-consensus-entrypoint checks for BASE_NODE_NETWORK:

if [[ -z "${BASE_NODE_NETWORK:-}" ]]; then
  echo "expected BASE_NODE_NETWORK to be set" 1>&2
  exit 1
fi

The op-node-entrypoint checks for OP_NODE_NETWORK or OP_NODE_ROLLUP_CONFIG — allowing either a named network or a custom rollup config file. This reflects op-node's more flexible configuration model.

Phase 2: Engine API Polling

Both scripts use the same clever technique to detect when the execution client is ready. Here's the polling loop from base-consensus-entrypoint lines 31-34:

until [ "$(curl -s --max-time 10 --connect-timeout 5 \
    -w '%{http_code}' -o /dev/null \
    "${BASE_NODE_L2_ENGINE_RPC/ws/http}")" -eq 401 ]; do
  echo "waiting for execution client to be ready"
  sleep 5
done

This is elegant: the Engine API is an authenticated endpoint. Without a valid JWT, any request returns HTTP 401 Unauthorized. The script doesn't care about the response body — it only checks the status code. A 401 proves the Engine API is listening and the authentication layer is active. Connection refused or timeout means the execution client isn't ready yet.

Notice the ${BASE_NODE_L2_ENGINE_RPC/ws/http} substitution — the Engine API URL is stored as a WebSocket URL (ws://execution:8551), but curl needs HTTP. This bash parameter expansion swaps ws for http inline.

Phase 3: IP Discovery, JWT, and Exec

After the Engine API is confirmed ready, both scripts discover the node's public IP for P2P advertisement using a shared get_public_ip() function that cycles through four providers (ifconfig.me, api.ipify.org, ipecho.net, v4.ident.me). Then they write the JWT secret and exec the binary.

The base-consensus-entrypoint has one unique feature — follow mode:

if [[ -n "${BASE_NODE_SOURCE_L2_RPC:-}" ]]; then
  echo "Running base-consensus in follow mode because BASE_NODE_SOURCE_L2_RPC is set"
  exec ./base-consensus follow
else
  exec ./base-consensus node
fi

Follow mode allows the consensus client to sync from another L2 node's RPC endpoint rather than derivating from L1. This is useful for quickly bootstrapping a new node.

sequenceDiagram
    participant EP as base-consensus-entrypoint
    participant IP as IP Providers
    participant EX as Execution Client
    participant FS as Filesystem
    participant BC as base-consensus

    EP->>EP: Validate BASE_NODE_NETWORK
    loop Until HTTP 401
        EP->>EX: curl http://execution:8551
        EX-->>EP: Connection refused / 401
    end
    EP->>IP: curl ifconfig.me (+ fallbacks)
    IP-->>EP: Public IP
    EP->>FS: Write JWT to /tmp/engine-auth-jwt
    alt SOURCE_L2_RPC set
        EP->>BC: exec ./base-consensus follow
    else
        EP->>BC: exec ./base-consensus node
    end

Execution Client Startup Patterns

The execution client entrypoints run before the consensus client connects. They're responsible for writing the JWT secret, setting up data directories, and launching the binary with the correct flags. Each of the three clients handles this differently.

Geth: Parametric Precision

The geth/geth-entrypoint is moderately complex, with five dedicated cache tuning variables (lines 18-22):

GETH_CACHE="${GETH_CACHE:-20480}"
GETH_CACHE_DATABASE="${GETH_CACHE_DATABASE:-20}"
GETH_CACHE_GC="${GETH_CACHE_GC:-12}"
GETH_CACHE_SNAPSHOT="${GETH_CACHE_SNAPSHOT:-24}"
GETH_CACHE_TRIE="${GETH_CACHE_TRIE:-44}"

These percentages control how Geth distributes its 20GB cache pool. The defaults allocate 44% to trie caching, 24% to snapshots, 20% to the database, and 12% to garbage collection — a profile optimized for L2 workloads where trie operations dominate.

Geth also conditionally adds ethstats, unprotected transactions, state scheme, and bootnode flags using a pattern of checking whether the variable is set (not just non-empty) with ${VAR+x} syntax on lines 33-51.

Nethermind: Elegant Simplicity

The nethermind/nethermind-entrypoint is the most straightforward of the three. Its key differentiator is the --config flag:

exec ./nethermind \
    --config="$OP_NODE_NETWORK" \

Nethermind ships with built-in network configurations. Passing --config=base-mainnet activates all the chain-specific parameters — genesis, fork heights, gas limits — without any additional flags. This is why Nethermind doesn't need a RETH_CHAIN or custom initialization step.

Reth: The Complex Path

Reth's entrypoint is by far the most sophisticated, and it deserves its own section.

Reth's Historical Proofs Initialization

The reth/reth-entrypoint contains the most complex shell logic in the entire repository. Beyond the standard startup, it handles three unique features: log level translation, Flashblocks support, and historical proofs initialization.

Log Level Translation

Reth uses a verbosity-flag convention (-v, -vv, -vvv, etc.) rather than named log levels. The entrypoint translates between the two on lines 31-51:

case "$LOG_LEVEL" in
    "error") LOG_LEVEL="v" ;;
    "warn")  LOG_LEVEL="vv" ;;
    "info")  LOG_LEVEL="vvv" ;;
    "debug") LOG_LEVEL="vvvv" ;;
    "trace") LOG_LEVEL="vvvvv" ;;
esac

This allows operators to set LOG_LEVEL=debug uniformly across all clients.

The Historical Proofs Multi-Stage Startup

The most remarkable code starts at line 75. When RETH_HISTORICAL_PROOFS=true, Reth cannot simply start and serve — it needs to initialize a historical proofs database, but that requires the node to have synced past the genesis block first. The problem: Reth doesn't support starting an old database in read-only mode.

The solution is a three-phase startup:

sequenceDiagram
    participant EP as reth-entrypoint
    participant R1 as Reth (phase 1)
    participant RPC as JSON-RPC localhost
    participant R2 as Reth proofs init
    participant R3 as Reth (final)

    EP->>R1: Start reth node in background
    Note over R1: Syncing from genesis...
    loop Up to 6 hours
        EP->>RPC: eth_getBlockByNumber("latest")
        RPC-->>EP: block number = 0x0
        Note over EP: Still at genesis, wait...
    end
    EP->>RPC: eth_getBlockByNumber("latest")
    RPC-->>EP: block number > 0x0
    Note over EP: Synced past genesis!
    EP->>R1: kill (SIGTERM)
    EP->>EP: wait_for_pid (poll /proc/PID)
    Note over R1: Graceful shutdown
    EP->>R2: reth proofs init
    Note over R2: Initialize historical proofs DB
    EP->>R3: exec reth node (with --proofs-history)
    Note over R3: Normal operation

Phase 1: Sync past genesis. Reth starts as a background process with &, binding only to localhost's HTTP port. The script then enters a polling loop, making JSON-RPC calls to eth_getBlockByNumber and checking if the result is beyond block 0x0. The timeout is set to 6 hours (60 * 60 * 6 seconds on line 89) because Reth's static file manager initialization can be extremely slow on large databases.

Phase 2: Graceful shutdown and proofs init. Once synced past genesis, the script sends SIGTERM to the background Reth process and waits for it to exit using the wait_for_pid helper (lines 53-67):

wait_for_pid() {
    local pid="$1"
    if [[ ! -e "/proc/$pid" ]]; then
        echo "Process $pid does not exist." >&2
        return 1
    fi
    while [[ -e "/proc/$pid" ]]; do
        sleep 1
    done
}

This polls /proc/$pid rather than using wait because wait only works for child processes in the same shell session, and the process was started in the background. After Reth exits, the script runs reth proofs init to build the historical proofs database.

Phase 3: Normal startup. Finally, Reth restarts with --proofs-history and --proofs-history.storage-path flags via exec, replacing the script process.

Tip: The 6-hour timeout might seem extreme, but it's calibrated for real-world conditions. If you're running on spinning disks or with limited memory, the static file manager's initial scan of a ~2TB Base mainnet database genuinely takes hours. If you're on NVMe with 64GB+ RAM, expect it to complete in minutes.

The Complete Boot Timeline

Putting it all together, here's the full sequence from docker compose up to a syncing node:

flowchart TD
    A["docker compose up"] --> B["Start execution container"]
    B --> C["Run execution-entrypoint<br/>(reth/geth/nethermind)"]
    C --> D["Write JWT secret"]
    D --> E["Start execution client binary"]
    A --> F["Start node container<br/>(after depends_on)"]
    F --> G["Run consensus-entrypoint"]
    G --> H{"USE_BASE_CONSENSUS?"}
    H -->|true| I["base-consensus-entrypoint"]
    H -->|false| J["op-node-entrypoint"]
    I --> K["Poll Engine API for 401"]
    J --> K
    K --> L["Detect public IP"]
    L --> M["Write JWT secret"]
    M --> N["exec consensus binary"]
    N --> O["Connect to Engine API"]
    O --> P["Node syncing ✓"]

The execution client starts first and begins syncing independently. The consensus client waits until the Engine API is proven ready (HTTP 401), discovers its public IP, writes the shared JWT secret, and launches. Once connected, the consensus client drives block derivation from L1 while the execution client handles state execution.

What's Next

We've seen how the startup scripts share patterns while differing in complexity. But we've only scratched the surface of why each execution client is different. In Part 3, we'll compare the three clients head-to-head — their Dockerfiles, build pipelines, runtime configurations, and a critical asymmetry where base-consensus only ships with the Reth image.