Block Execution: Blockchain, StateDB, and the EVM
Prerequisites
- ›Articles 1-2: Architecture and Boot Process
- ›Ethereum state model (accounts, storage, state trie)
- ›Understanding of Merkle Patricia Tries
Block Execution: Blockchain, StateDB, and the EVM
This is where Ethereum actually happens. Everything we've covered so far — the CLI framework, the Node container, the service wiring — exists to support one operation: executing blocks and advancing the world state. In this article we trace the complete path a block takes from arrival to state commitment, covering the three major subsystems: the BlockChain manager that tracks the canonical chain, the StateDB that holds world state in memory, and the EVM that executes contract bytecode.
BlockChain: The Canonical Chain Manager
The BlockChain struct is the central coordinator for chain state. It manages the canonical chain, handles block insertion and reorganizations, and maintains LRU caches for recently accessed data:
type BlockChain struct {
chainConfig *params.ChainConfig
cfg *BlockChainConfig
db ethdb.Database
// ... caches, trie database, snapshot tree, etc.
}
The BlockChainConfig struct (not to be confused with params.ChainConfig) controls runtime behavior — trie cache limits, snapshot settings, archive mode, state pruning policy, and VM configuration. This is the configuration that eth.New() assembles from CLI flags and protocol defaults.
The cache constants reveal performance priorities:
const (
bodyCacheLimit = 256
blockCacheLimit = 256
receiptsCacheLimit = 32
txLookupCacheLimit = 1024
)
Transaction lookups get the largest cache (1024 entries) because eth_getTransactionByHash is one of the most frequently called RPC methods.
classDiagram
class BlockChain {
-chainConfig *ChainConfig
-cfg *BlockChainConfig
-db ethdb.Database
-triedb *triedb.Database
-snaps *snapshot.Tree
-bodyCache LRU
-blockCache LRU
-txLookupCache LRU
+InsertChain(blocks) error
+CurrentBlock() *Header
+StateAt(root) (*StateDB, error)
+GetBlock(hash, number) *Block
}
class BlockChainConfig {
+TrieCleanLimit int
+TrieDirtyLimit int
+StateScheme string
+ArchiveMode bool
+SnapshotLimit int
+VmConfig vm.Config
+triedbConfig() *triedb.Config
}
BlockChain --> BlockChainConfig
The triedbConfig() method is a key bridge between the blockchain and storage layers — it translates BlockChainConfig parameters into the appropriate triedb.Config, selecting between hash-based and path-based storage schemes. We'll explore this in depth in Part 4.
Consensus Engine: Validation vs. Execution
The consensus.Engine interface cleanly separates block validation from block execution. This separation is fundamental — validation checks that a block could be valid (correct difficulty, valid seal, proper uncle references), while execution applies it to produce the next state.
classDiagram
class Engine {
<<interface>>
+Author(header) address, error
+VerifyHeader(chain, header) error
+VerifyHeaders(chain, headers) chan, chan
+VerifyUncles(chain, block) error
+Prepare(chain, header) error
+Finalize(chain, header, state, body)
+FinalizeAndAssemble(chain, header, state, body, receipts) block, error
+Seal(chain, block, results, stop) error
}
class Beacon {
-ethone Engine
}
class Clique {
-config *CliqueConfig
}
class EthashFaker {
}
Beacon ..|> Engine
Clique ..|> Engine
EthashFaker ..|> Engine
Beacon --> Clique : wraps
Beacon --> EthashFaker : wraps
Post-Merge, every consensus engine is wrapped by beacon.New(). The Beacon engine handles proof-of-stake specifics while delegating pre-Merge logic to the inner engine. Since Geth now only supports post-Merge networks (as enforced in CreateConsensusEngine), TerminalTotalDifficulty must always be set.
The State Execution Pipeline
The execution pipeline is where blocks become state transitions. It starts with StateProcessor.Process(), which iterates through every transaction in a block:
flowchart TD
A["StateProcessor.Process(block, statedb)"] --> B["Apply pre-execution system calls<br/>(beacon root, parent hash)"]
B --> C["For each transaction:"]
C --> D["TransactionToMessage()"]
D --> E["ApplyTransactionWithEVM()"]
E --> F["State Transition:<br/>intrinsic gas → EVM execution → refunds"]
F --> G["Generate receipt"]
G --> C
C --> H["postExecution()<br/>(withdrawals, requests)"]
H --> I["Finalize via consensus engine"]
I --> J["Return receipts, logs, gas used"]
The function first applies system calls — processing the beacon block root (EIP-4788) and the parent block hash (EIP-2935, for Prague and Verkle). Then it iterates through transactions, applying each one via ApplyTransactionWithEVM. A single EVM instance is created for the entire block and reused across transactions, with only the TxContext being swapped between calls.
The ExecutionResult captures the outcome of each transaction:
type ExecutionResult struct {
UsedGas uint64
MaxUsedGas uint64
Err error
ReturnData []byte
}
Tip: When debugging EVM execution issues,
ExecutionResult.Erris not a Go error that means "something went wrong" — it's a protocol-level result. AnErrExecutionRevertederror with non-nilReturnDatameans the contract explicitly reverted, and the return data contains the revert reason.
The IntrinsicGas function computes the baseline gas cost before EVM execution even begins — factoring in creation vs. call, calldata zero/non-zero byte costs, access list entries, and authorization list entries. This is where EIP-2028's reduced calldata costs and EIP-3860's initcode size limits are enforced.
StateDB: In-Memory World State
The StateDB is the in-memory cache for Ethereum's world state. It sits between the EVM and the persistent trie, providing fast reads and writes while tracking all modifications for potential rollback:
type StateDB struct {
db Database
prefetcher *triePrefetcher
reader Reader
trie Trie
originalRoot common.Hash
stateObjects map[common.Address]*stateObject
stateObjectsDestruct map[common.Address]*stateObject
mutations map[common.Address]*mutation
refund uint64
// journal for revert snapshots, logs, etc.
}
Key design decisions:
-
Lazy trie resolution — The
triefield is "resolved on first access," meaning the actual Merkle trie root isn't loaded until state is read. -
Object tracking —
stateObjectsholds live account objects being modified.stateObjectsDestructtracks destroyed accounts.mutationsrecords account-level changes at transaction boundaries. -
Journal-based snapshots — StateDB uses a journal pattern for creating revert points. When the EVM hits a
REVERTopcode, the journal is replayed backward to undo all state changes within that call frame. -
Prefetching — The
triePrefetcherloads trie nodes in background goroutines before they're needed, reducing the number of blocking disk reads during execution.
classDiagram
class StateDB {
-db Database
-trie Trie
-stateObjects map~Address → stateObject~
-mutations map~Address → mutation~
-refund uint64
+GetBalance(addr) uint256
+SetState(addr, key, value)
+CreateAccount(addr)
+Snapshot() int
+RevertToSnapshot(id)
+Commit(block, collectLeaf) (Hash, error)
}
class stateObject {
-address Address
-data types.StateAccount
-code []byte
-dirtyStorage Storage
-originStorage Storage
}
StateDB o-- stateObject
EVM Architecture: The Table-Driven Interpreter
The EVM is the heart of Ethereum's computation model. Geth implements it as a table-driven interpreter — a tight loop that fetches an opcode, looks up its handler in a 256-entry table, and executes it.
The EVM struct carries all execution context:
type EVM struct {
Context BlockContext // Block-level: coinbase, gas limit, block number, time
TxContext // Tx-level: origin, gas price, blob hashes
StateDB StateDB // World state access
table *JumpTable // Fork-specific opcode handlers
depth int // Call stack depth
chainConfig *params.ChainConfig
chainRules params.Rules
Config Config // VM config: tracer, extra EIPs
}
The BlockContext / TxContext split is deliberate. Block-level context (coinbase, timestamp, block number) stays constant for all transactions in a block. Transaction-level context (sender address, gas price) changes with each transaction. This enables NewEVM to be called once per block, with SetTxContext swapped between transactions.
The Run() method is the interpreter loop:
flowchart TD
START["Run(contract, input, readOnly)"] --> CHECK["Code empty?"]
CHECK -->|yes| RET_NIL["Return nil, nil"]
CHECK -->|no| INIT["Create Memory, Stack, ScopeContext"]
INIT --> LOOP["Main Loop"]
LOOP --> FETCH["op = contract.GetOp(pc)"]
FETCH --> LOOKUP["operation = jumpTable[op]"]
LOOKUP --> VALIDATE["Check stack bounds"]
VALIDATE --> GAS["Deduct constantGas"]
GAS --> DYNGAS["Calculate & deduct dynamicGas"]
DYNGAS --> EXEC["Execute operation.execute()"]
EXEC --> NEXT["Advance pc"]
NEXT --> LOOP
EXEC -->|STOP/RETURN/REVERT| EXIT["Return result"]
Each entry in the jump table is an operation struct:
type operation struct {
execute executionFunc
constantGas uint64
dynamicGas gasFunc
minStack int
maxStack int
memorySize memorySizeFunc
undefined bool
}
This design means every opcode behavior — execution function, gas cost, stack requirements, memory sizing — is captured in a single struct. Adding a new opcode is just adding an entry to the table.
Fork Layering: How New EIPs Add Opcodes
The fork-specific instruction sets are built by layering. Each new fork copies the previous fork's table and overrides specific entries. The chain in jump_table.go starts at Frontier and extends through Amsterdam:
var (
frontierInstructionSet = newFrontierInstructionSet()
homesteadInstructionSet = newHomesteadInstructionSet()
// ... each one builds on the previous
pragueInstructionSet = newPragueInstructionSet()
osakaInstructionSet = newOsakaInstructionSet()
amsterdamInstructionSet = newAmsterdamInstructionSet()
)
For example, newAmsterdamInstructionSet() copies Osaka and enables EIP-7843 (SLOTNUM opcode) and EIP-8024:
func newAmsterdamInstructionSet() JumpTable {
instructionSet := newOsakaInstructionSet()
enable7843(&instructionSet)
enable8024(&instructionSet)
return validate(instructionSet)
}
When NewEVM() is called, it selects the correct instruction set based on the chainRules — a set of boolean flags derived from ChainConfig at the current block number and timestamp. The selection uses a reverse-chronological switch statement: the most recent fork is checked first.
Tip: To trace how a specific EIP was implemented, search for its
enablefunction (e.g.,enable4844). These functions modify the jump table in-place, setting theexecute,constantGas,dynamicGas, and stack parameters for each affected opcode.
This execution pipeline — BlockChain → StateProcessor → EVM → StateDB — is the core of what Geth does. But where does all this state actually go when it's committed? That's the topic of Part 4, where we follow data from the in-memory StateDB through the Merkle Patricia Trie and down to the on-disk key-value stores.