Boot Sequence Deep Dive: From docker compose up to a Syncing Node
Prerequisites
- ›Article 1: Architecture and Codebase Navigation
- ›What the Engine API is (consensus ↔ execution communication protocol)
- ›Familiarity with JWT authentication basics
Boot Sequence Deep Dive: From Docker compose up to a Syncing Node
In Part 1 we mapped the codebase and understood that base/node orchestrates two services connected via the Engine API. But the architecture diagram is static — what actually happens when you type docker compose up? The answer involves a carefully choreographed dance across multiple shell scripts, where ordering constraints are enforced not by a framework but by curl loops and process lifecycle management.
This article traces the entire boot sequence. We'll follow the execution path from Docker Compose's service startup, through the consensus entrypoint dispatcher, into each client's initialization logic — culminating in Reth's remarkable multi-stage startup that can pause for up to six hours waiting for the static file manager to initialize.
Docker Compose Startup Order
When you run docker compose up, Docker reads docker-compose.yml and determines the startup order from the depends_on graph. The node service declares a dependency on execution:
node:
depends_on:
- execution
Docker starts the execution container first, then the node container. But depends_on without a health check condition only waits for the container to be created — not for the application inside to be ready. The execution client might take seconds or minutes to bind its Engine API port. The actual readiness synchronization is handled entirely by the consensus entrypoint scripts.
sequenceDiagram
participant User
participant DC as Docker Compose
participant EX as Execution Container
participant ND as Node Container
User->>DC: docker compose up
DC->>EX: Start container
DC->>EX: Run: bash ./execution-entrypoint
Note over EX: Execution client initializing...
DC->>ND: Start container (depends_on satisfied)
DC->>ND: Run: bash ./consensus-entrypoint
Note over ND: Polls Engine API with curl
ND-->>EX: curl http://execution:8551 → connection refused
Note over ND: sleep 5, retry...
Note over EX: Engine API ready on :8551
ND-->>EX: curl http://execution:8551 → HTTP 401
Note over ND: 401 = authenticated endpoint exists!
ND->>ND: Write JWT secret, exec consensus binary
ND-->>EX: Engine API with JWT auth
Note over EX,ND: Both services syncing
The command overrides in docker-compose.yml are what distinguish the two containers despite sharing the same image. The execution service runs bash ./execution-entrypoint (line 13), and the node service runs bash ./consensus-entrypoint (line 33). Both files exist in the Docker image because every Dockerfile COPYs all entrypoint scripts into /app.
The Consensus Entrypoint Dispatcher
The consensus-entrypoint is a simple but important dispatcher. It routes to one of two consensus clients based on a single environment variable:
if [ "${USE_BASE_CONSENSUS:-false}" = "true" ]; then
if [ -f ./base-consensus-entrypoint ]; then
echo "Using Base Client"
exec ./base-consensus-entrypoint
else
echo "Base client is not supported for this node type"
exit 1
fi
else
echo "Using OP Node"
exec ./op-node-entrypoint
fi
flowchart TD
CE[consensus-entrypoint] -->|"USE_BASE_CONSENSUS=true"| CHECK{"base-consensus-entrypoint<br/>exists?"}
CE -->|"USE_BASE_CONSENSUS=false"| OP[op-node-entrypoint]
CHECK -->|Yes| BC[base-consensus-entrypoint]
CHECK -->|No| FAIL["Exit 1:<br/>not supported for this node type"]
The file-existence check on line 5 is not just defensive coding — it's architecturally significant. As we'll explore in Part 3, only the Reth Dockerfile bundles the base-consensus binary and its entrypoint. If you set CLIENT=geth and USE_BASE_CONSENSUS=true, the Geth image won't have base-consensus-entrypoint, and the dispatcher will fail gracefully with an error message rather than a cryptic "file not found."
Note the use of exec — the shell process is replaced by the target script. This means the consensus entrypoint's PID becomes PID 1 (or inherits the parent PID), which is important for signal handling and Docker's stop behavior.
Consensus Client Startup: Waiting for the Engine API
Both consensus entrypoints follow the same three-phase pattern: validate environment, wait for Engine API, then exec the binary. Let's compare them side by side.
Phase 1: Environment Validation
The base-consensus-entrypoint checks for BASE_NODE_NETWORK:
if [[ -z "${BASE_NODE_NETWORK:-}" ]]; then
echo "expected BASE_NODE_NETWORK to be set" 1>&2
exit 1
fi
The op-node-entrypoint checks for OP_NODE_NETWORK or OP_NODE_ROLLUP_CONFIG — allowing either a named network or a custom rollup config file. This reflects op-node's more flexible configuration model.
Phase 2: Engine API Polling
Both scripts use the same clever technique to detect when the execution client is ready. Here's the polling loop from base-consensus-entrypoint lines 31-34:
until [ "$(curl -s --max-time 10 --connect-timeout 5 \
-w '%{http_code}' -o /dev/null \
"${BASE_NODE_L2_ENGINE_RPC/ws/http}")" -eq 401 ]; do
echo "waiting for execution client to be ready"
sleep 5
done
This is elegant: the Engine API is an authenticated endpoint. Without a valid JWT, any request returns HTTP 401 Unauthorized. The script doesn't care about the response body — it only checks the status code. A 401 proves the Engine API is listening and the authentication layer is active. Connection refused or timeout means the execution client isn't ready yet.
Notice the ${BASE_NODE_L2_ENGINE_RPC/ws/http} substitution — the Engine API URL is stored as a WebSocket URL (ws://execution:8551), but curl needs HTTP. This bash parameter expansion swaps ws for http inline.
Phase 3: IP Discovery, JWT, and Exec
After the Engine API is confirmed ready, both scripts discover the node's public IP for P2P advertisement using a shared get_public_ip() function that cycles through four providers (ifconfig.me, api.ipify.org, ipecho.net, v4.ident.me). Then they write the JWT secret and exec the binary.
The base-consensus-entrypoint has one unique feature — follow mode:
if [[ -n "${BASE_NODE_SOURCE_L2_RPC:-}" ]]; then
echo "Running base-consensus in follow mode because BASE_NODE_SOURCE_L2_RPC is set"
exec ./base-consensus follow
else
exec ./base-consensus node
fi
Follow mode allows the consensus client to sync from another L2 node's RPC endpoint rather than derivating from L1. This is useful for quickly bootstrapping a new node.
sequenceDiagram
participant EP as base-consensus-entrypoint
participant IP as IP Providers
participant EX as Execution Client
participant FS as Filesystem
participant BC as base-consensus
EP->>EP: Validate BASE_NODE_NETWORK
loop Until HTTP 401
EP->>EX: curl http://execution:8551
EX-->>EP: Connection refused / 401
end
EP->>IP: curl ifconfig.me (+ fallbacks)
IP-->>EP: Public IP
EP->>FS: Write JWT to /tmp/engine-auth-jwt
alt SOURCE_L2_RPC set
EP->>BC: exec ./base-consensus follow
else
EP->>BC: exec ./base-consensus node
end
Execution Client Startup Patterns
The execution client entrypoints run before the consensus client connects. They're responsible for writing the JWT secret, setting up data directories, and launching the binary with the correct flags. Each of the three clients handles this differently.
Geth: Parametric Precision
The geth/geth-entrypoint is moderately complex, with five dedicated cache tuning variables (lines 18-22):
GETH_CACHE="${GETH_CACHE:-20480}"
GETH_CACHE_DATABASE="${GETH_CACHE_DATABASE:-20}"
GETH_CACHE_GC="${GETH_CACHE_GC:-12}"
GETH_CACHE_SNAPSHOT="${GETH_CACHE_SNAPSHOT:-24}"
GETH_CACHE_TRIE="${GETH_CACHE_TRIE:-44}"
These percentages control how Geth distributes its 20GB cache pool. The defaults allocate 44% to trie caching, 24% to snapshots, 20% to the database, and 12% to garbage collection — a profile optimized for L2 workloads where trie operations dominate.
Geth also conditionally adds ethstats, unprotected transactions, state scheme, and bootnode flags using a pattern of checking whether the variable is set (not just non-empty) with ${VAR+x} syntax on lines 33-51.
Nethermind: Elegant Simplicity
The nethermind/nethermind-entrypoint is the most straightforward of the three. Its key differentiator is the --config flag:
exec ./nethermind \
--config="$OP_NODE_NETWORK" \
Nethermind ships with built-in network configurations. Passing --config=base-mainnet activates all the chain-specific parameters — genesis, fork heights, gas limits — without any additional flags. This is why Nethermind doesn't need a RETH_CHAIN or custom initialization step.
Reth: The Complex Path
Reth's entrypoint is by far the most sophisticated, and it deserves its own section.
Reth's Historical Proofs Initialization
The reth/reth-entrypoint contains the most complex shell logic in the entire repository. Beyond the standard startup, it handles three unique features: log level translation, Flashblocks support, and historical proofs initialization.
Log Level Translation
Reth uses a verbosity-flag convention (-v, -vv, -vvv, etc.) rather than named log levels. The entrypoint translates between the two on lines 31-51:
case "$LOG_LEVEL" in
"error") LOG_LEVEL="v" ;;
"warn") LOG_LEVEL="vv" ;;
"info") LOG_LEVEL="vvv" ;;
"debug") LOG_LEVEL="vvvv" ;;
"trace") LOG_LEVEL="vvvvv" ;;
esac
This allows operators to set LOG_LEVEL=debug uniformly across all clients.
The Historical Proofs Multi-Stage Startup
The most remarkable code starts at line 75. When RETH_HISTORICAL_PROOFS=true, Reth cannot simply start and serve — it needs to initialize a historical proofs database, but that requires the node to have synced past the genesis block first. The problem: Reth doesn't support starting an old database in read-only mode.
The solution is a three-phase startup:
sequenceDiagram
participant EP as reth-entrypoint
participant R1 as Reth (phase 1)
participant RPC as JSON-RPC localhost
participant R2 as Reth proofs init
participant R3 as Reth (final)
EP->>R1: Start reth node in background
Note over R1: Syncing from genesis...
loop Up to 6 hours
EP->>RPC: eth_getBlockByNumber("latest")
RPC-->>EP: block number = 0x0
Note over EP: Still at genesis, wait...
end
EP->>RPC: eth_getBlockByNumber("latest")
RPC-->>EP: block number > 0x0
Note over EP: Synced past genesis!
EP->>R1: kill (SIGTERM)
EP->>EP: wait_for_pid (poll /proc/PID)
Note over R1: Graceful shutdown
EP->>R2: reth proofs init
Note over R2: Initialize historical proofs DB
EP->>R3: exec reth node (with --proofs-history)
Note over R3: Normal operation
Phase 1: Sync past genesis. Reth starts as a background process with &, binding only to localhost's HTTP port. The script then enters a polling loop, making JSON-RPC calls to eth_getBlockByNumber and checking if the result is beyond block 0x0. The timeout is set to 6 hours (60 * 60 * 6 seconds on line 89) because Reth's static file manager initialization can be extremely slow on large databases.
Phase 2: Graceful shutdown and proofs init. Once synced past genesis, the script sends SIGTERM to the background Reth process and waits for it to exit using the wait_for_pid helper (lines 53-67):
wait_for_pid() {
local pid="$1"
if [[ ! -e "/proc/$pid" ]]; then
echo "Process $pid does not exist." >&2
return 1
fi
while [[ -e "/proc/$pid" ]]; do
sleep 1
done
}
This polls /proc/$pid rather than using wait because wait only works for child processes in the same shell session, and the process was started in the background. After Reth exits, the script runs reth proofs init to build the historical proofs database.
Phase 3: Normal startup. Finally, Reth restarts with --proofs-history and --proofs-history.storage-path flags via exec, replacing the script process.
Tip: The 6-hour timeout might seem extreme, but it's calibrated for real-world conditions. If you're running on spinning disks or with limited memory, the static file manager's initial scan of a ~2TB Base mainnet database genuinely takes hours. If you're on NVMe with 64GB+ RAM, expect it to complete in minutes.
The Complete Boot Timeline
Putting it all together, here's the full sequence from docker compose up to a syncing node:
flowchart TD
A["docker compose up"] --> B["Start execution container"]
B --> C["Run execution-entrypoint<br/>(reth/geth/nethermind)"]
C --> D["Write JWT secret"]
D --> E["Start execution client binary"]
A --> F["Start node container<br/>(after depends_on)"]
F --> G["Run consensus-entrypoint"]
G --> H{"USE_BASE_CONSENSUS?"}
H -->|true| I["base-consensus-entrypoint"]
H -->|false| J["op-node-entrypoint"]
I --> K["Poll Engine API for 401"]
J --> K
K --> L["Detect public IP"]
L --> M["Write JWT secret"]
M --> N["exec consensus binary"]
N --> O["Connect to Engine API"]
O --> P["Node syncing ✓"]
The execution client starts first and begins syncing independently. The consensus client waits until the Engine API is proven ready (HTTP 401), discovers its public IP, writes the shared JWT secret, and launches. Once connected, the consensus client drives block derivation from L1 while the execution client handles state execution.
What's Next
We've seen how the startup scripts share patterns while differing in complexity. But we've only scratched the surface of why each execution client is different. In Part 3, we'll compare the three clients head-to-head — their Dockerfiles, build pipelines, runtime configurations, and a critical asymmetry where base-consensus only ships with the Reth image.