Read OSS

Block Execution: Blockchain, StateDB, and the EVM

Advanced

Prerequisites

  • Articles 1-2: Architecture and Boot Process
  • Ethereum state model (accounts, storage, state trie)
  • Understanding of Merkle Patricia Tries

Block Execution: Blockchain, StateDB, and the EVM

This is where Ethereum actually happens. Everything we've covered so far — the CLI framework, the Node container, the service wiring — exists to support one operation: executing blocks and advancing the world state. In this article we trace the complete path a block takes from arrival to state commitment, covering the three major subsystems: the BlockChain manager that tracks the canonical chain, the StateDB that holds world state in memory, and the EVM that executes contract bytecode.

BlockChain: The Canonical Chain Manager

The BlockChain struct is the central coordinator for chain state. It manages the canonical chain, handles block insertion and reorganizations, and maintains LRU caches for recently accessed data:

type BlockChain struct {
    chainConfig *params.ChainConfig
    cfg         *BlockChainConfig
    db          ethdb.Database
    // ... caches, trie database, snapshot tree, etc.
}

The BlockChainConfig struct (not to be confused with params.ChainConfig) controls runtime behavior — trie cache limits, snapshot settings, archive mode, state pruning policy, and VM configuration. This is the configuration that eth.New() assembles from CLI flags and protocol defaults.

The cache constants reveal performance priorities:

const (
    bodyCacheLimit     = 256
    blockCacheLimit    = 256
    receiptsCacheLimit = 32
    txLookupCacheLimit = 1024
)

Transaction lookups get the largest cache (1024 entries) because eth_getTransactionByHash is one of the most frequently called RPC methods.

classDiagram
    class BlockChain {
        -chainConfig *ChainConfig
        -cfg *BlockChainConfig
        -db ethdb.Database
        -triedb *triedb.Database
        -snaps *snapshot.Tree
        -bodyCache LRU
        -blockCache LRU
        -txLookupCache LRU
        +InsertChain(blocks) error
        +CurrentBlock() *Header
        +StateAt(root) (*StateDB, error)
        +GetBlock(hash, number) *Block
    }
    class BlockChainConfig {
        +TrieCleanLimit int
        +TrieDirtyLimit int
        +StateScheme string
        +ArchiveMode bool
        +SnapshotLimit int
        +VmConfig vm.Config
        +triedbConfig() *triedb.Config
    }
    BlockChain --> BlockChainConfig

The triedbConfig() method is a key bridge between the blockchain and storage layers — it translates BlockChainConfig parameters into the appropriate triedb.Config, selecting between hash-based and path-based storage schemes. We'll explore this in depth in Part 4.

Consensus Engine: Validation vs. Execution

The consensus.Engine interface cleanly separates block validation from block execution. This separation is fundamental — validation checks that a block could be valid (correct difficulty, valid seal, proper uncle references), while execution applies it to produce the next state.

classDiagram
    class Engine {
        <<interface>>
        +Author(header) address, error
        +VerifyHeader(chain, header) error
        +VerifyHeaders(chain, headers) chan, chan
        +VerifyUncles(chain, block) error
        +Prepare(chain, header) error
        +Finalize(chain, header, state, body)
        +FinalizeAndAssemble(chain, header, state, body, receipts) block, error
        +Seal(chain, block, results, stop) error
    }
    class Beacon {
        -ethone Engine
    }
    class Clique {
        -config *CliqueConfig
    }
    class EthashFaker {
    }
    Beacon ..|> Engine
    Clique ..|> Engine
    EthashFaker ..|> Engine
    Beacon --> Clique : wraps
    Beacon --> EthashFaker : wraps

Post-Merge, every consensus engine is wrapped by beacon.New(). The Beacon engine handles proof-of-stake specifics while delegating pre-Merge logic to the inner engine. Since Geth now only supports post-Merge networks (as enforced in CreateConsensusEngine), TerminalTotalDifficulty must always be set.

The State Execution Pipeline

The execution pipeline is where blocks become state transitions. It starts with StateProcessor.Process(), which iterates through every transaction in a block:

flowchart TD
    A["StateProcessor.Process(block, statedb)"] --> B["Apply pre-execution system calls<br/>(beacon root, parent hash)"]
    B --> C["For each transaction:"]
    C --> D["TransactionToMessage()"]
    D --> E["ApplyTransactionWithEVM()"]
    E --> F["State Transition:<br/>intrinsic gas → EVM execution → refunds"]
    F --> G["Generate receipt"]
    G --> C
    C --> H["postExecution()<br/>(withdrawals, requests)"]
    H --> I["Finalize via consensus engine"]
    I --> J["Return receipts, logs, gas used"]

The function first applies system calls — processing the beacon block root (EIP-4788) and the parent block hash (EIP-2935, for Prague and Verkle). Then it iterates through transactions, applying each one via ApplyTransactionWithEVM. A single EVM instance is created for the entire block and reused across transactions, with only the TxContext being swapped between calls.

The ExecutionResult captures the outcome of each transaction:

type ExecutionResult struct {
    UsedGas    uint64
    MaxUsedGas uint64
    Err        error
    ReturnData []byte
}

Tip: When debugging EVM execution issues, ExecutionResult.Err is not a Go error that means "something went wrong" — it's a protocol-level result. An ErrExecutionReverted error with non-nil ReturnData means the contract explicitly reverted, and the return data contains the revert reason.

The IntrinsicGas function computes the baseline gas cost before EVM execution even begins — factoring in creation vs. call, calldata zero/non-zero byte costs, access list entries, and authorization list entries. This is where EIP-2028's reduced calldata costs and EIP-3860's initcode size limits are enforced.

StateDB: In-Memory World State

The StateDB is the in-memory cache for Ethereum's world state. It sits between the EVM and the persistent trie, providing fast reads and writes while tracking all modifications for potential rollback:

type StateDB struct {
    db         Database
    prefetcher *triePrefetcher
    reader     Reader
    trie       Trie

    originalRoot common.Hash
    stateObjects map[common.Address]*stateObject
    stateObjectsDestruct map[common.Address]*stateObject
    mutations    map[common.Address]*mutation

    refund uint64
    // journal for revert snapshots, logs, etc.
}

Key design decisions:

  1. Lazy trie resolution — The trie field is "resolved on first access," meaning the actual Merkle trie root isn't loaded until state is read.

  2. Object trackingstateObjects holds live account objects being modified. stateObjectsDestruct tracks destroyed accounts. mutations records account-level changes at transaction boundaries.

  3. Journal-based snapshots — StateDB uses a journal pattern for creating revert points. When the EVM hits a REVERT opcode, the journal is replayed backward to undo all state changes within that call frame.

  4. Prefetching — The triePrefetcher loads trie nodes in background goroutines before they're needed, reducing the number of blocking disk reads during execution.

classDiagram
    class StateDB {
        -db Database
        -trie Trie
        -stateObjects map~Address → stateObject~
        -mutations map~Address → mutation~
        -refund uint64
        +GetBalance(addr) uint256
        +SetState(addr, key, value)
        +CreateAccount(addr)
        +Snapshot() int
        +RevertToSnapshot(id)
        +Commit(block, collectLeaf) (Hash, error)
    }
    class stateObject {
        -address Address
        -data types.StateAccount
        -code []byte
        -dirtyStorage Storage
        -originStorage Storage
    }
    StateDB o-- stateObject

EVM Architecture: The Table-Driven Interpreter

The EVM is the heart of Ethereum's computation model. Geth implements it as a table-driven interpreter — a tight loop that fetches an opcode, looks up its handler in a 256-entry table, and executes it.

The EVM struct carries all execution context:

type EVM struct {
    Context BlockContext  // Block-level: coinbase, gas limit, block number, time
    TxContext             // Tx-level: origin, gas price, blob hashes
    StateDB StateDB       // World state access
    table   *JumpTable    // Fork-specific opcode handlers
    depth   int           // Call stack depth
    chainConfig *params.ChainConfig
    chainRules  params.Rules
    Config  Config        // VM config: tracer, extra EIPs
}

The BlockContext / TxContext split is deliberate. Block-level context (coinbase, timestamp, block number) stays constant for all transactions in a block. Transaction-level context (sender address, gas price) changes with each transaction. This enables NewEVM to be called once per block, with SetTxContext swapped between transactions.

The Run() method is the interpreter loop:

flowchart TD
    START["Run(contract, input, readOnly)"] --> CHECK["Code empty?"]
    CHECK -->|yes| RET_NIL["Return nil, nil"]
    CHECK -->|no| INIT["Create Memory, Stack, ScopeContext"]
    INIT --> LOOP["Main Loop"]
    LOOP --> FETCH["op = contract.GetOp(pc)"]
    FETCH --> LOOKUP["operation = jumpTable[op]"]
    LOOKUP --> VALIDATE["Check stack bounds"]
    VALIDATE --> GAS["Deduct constantGas"]
    GAS --> DYNGAS["Calculate & deduct dynamicGas"]
    DYNGAS --> EXEC["Execute operation.execute()"]
    EXEC --> NEXT["Advance pc"]
    NEXT --> LOOP
    EXEC -->|STOP/RETURN/REVERT| EXIT["Return result"]

Each entry in the jump table is an operation struct:

type operation struct {
    execute     executionFunc
    constantGas uint64
    dynamicGas  gasFunc
    minStack    int
    maxStack    int
    memorySize  memorySizeFunc
    undefined   bool
}

This design means every opcode behavior — execution function, gas cost, stack requirements, memory sizing — is captured in a single struct. Adding a new opcode is just adding an entry to the table.

Fork Layering: How New EIPs Add Opcodes

The fork-specific instruction sets are built by layering. Each new fork copies the previous fork's table and overrides specific entries. The chain in jump_table.go starts at Frontier and extends through Amsterdam:

var (
    frontierInstructionSet         = newFrontierInstructionSet()
    homesteadInstructionSet        = newHomesteadInstructionSet()
    // ... each one builds on the previous
    pragueInstructionSet           = newPragueInstructionSet()
    osakaInstructionSet            = newOsakaInstructionSet()
    amsterdamInstructionSet        = newAmsterdamInstructionSet()
)

For example, newAmsterdamInstructionSet() copies Osaka and enables EIP-7843 (SLOTNUM opcode) and EIP-8024:

func newAmsterdamInstructionSet() JumpTable {
    instructionSet := newOsakaInstructionSet()
    enable7843(&instructionSet)
    enable8024(&instructionSet)
    return validate(instructionSet)
}

When NewEVM() is called, it selects the correct instruction set based on the chainRules — a set of boolean flags derived from ChainConfig at the current block number and timestamp. The selection uses a reverse-chronological switch statement: the most recent fork is checked first.

Tip: To trace how a specific EIP was implemented, search for its enable function (e.g., enable4844). These functions modify the jump table in-place, setting the execute, constantGas, dynamicGas, and stack parameters for each affected opcode.

This execution pipeline — BlockChain → StateProcessor → EVM → StateDB — is the core of what Geth does. But where does all this state actually go when it's committed? That's the topic of Part 4, where we follow data from the in-memory StateDB through the Merkle Patricia Trie and down to the on-disk key-value stores.