Read OSS

State Storage: From StateDB Down to Disk

Advanced

Prerequisites

  • Articles 1-3
  • Understanding of Merkle Patricia Tries
  • Basic key-value database concepts

State Storage: From StateDB Down to Disk

In Part 3 we followed blocks through the execution pipeline and saw how the StateDB holds world state in memory during processing. But memory is volatile — Ethereum's state needs to survive node restarts, and it needs to be verifiable via Merkle proofs. This article traces the full path from the in-memory StateDB down to bytes on disk, revealing a carefully layered storage architecture that balances performance, verifiability, and disk space.

The Four-Layer Storage Stack

Geth's state storage is a four-layer stack. Each layer adds a specific concern:

flowchart TD
    subgraph "Layer 1 — Execution Cache"
        SDB["StateDB<br/>In-memory account objects<br/>Journal-based snapshots"]
    end
    subgraph "Layer 2 — Cryptographic Structure"
        MPT["Merkle Patricia Trie<br/>Authenticated data structure<br/>Account trie + storage tries"]
    end
    subgraph "Layer 3 — Trie Node Management"
        TDB["TrieDB<br/>Hash-based scheme (hashdb)<br/>OR Path-based scheme (pathdb)"]
    end
    subgraph "Layer 4 — On-Disk Storage"
        KV["Key-Value Store<br/>(LevelDB or Pebble)"]
        FREEZER["Ancient Freezer<br/>Append-only flat files"]
    end
    SDB --> MPT
    MPT --> TDB
    TDB --> KV
    KV --- FREEZER

Layer 1 (StateDB) is the write-through cache we covered in Part 3 — it holds modified account objects and provides snapshot/revert for transaction execution.

Layer 2 (Trie) is the Merkle Patricia Trie — the cryptographic data structure that Ethereum uses to produce verifiable state roots. Every block header contains a state root hash that commits to the entire world state.

Layer 3 (TrieDB) manages how trie nodes are stored and retrieved. This is where the critical choice between hash-based and path-based storage schemes happens.

Layer 4 (ethdb) is the actual on-disk database — LevelDB or Pebble for active data, plus the "ancient" freezer for historical blocks and receipts.

ethdb: The Composed Database Interface

As we saw in Part 1, the ethdb package defines the storage interfaces through composition. The pattern starts with minimal single-concern interfaces and builds up:

classDiagram
    class KeyValueReader {
        <<interface>>
        +Has(key) bool, error
        +Get(key) bytes, error
    }
    class KeyValueWriter {
        <<interface>>
        +Put(key, value) error
        +Delete(key) error
    }
    class KeyValueStore {
        <<interface>>
        KeyValueReader
        KeyValueWriter
        KeyValueStater
        KeyValueSyncer
        KeyValueRangeDeleter
        Batcher
        Iteratee
        Compacter
        io.Closer
    }
    class AncientStore {
        <<interface>>
        AncientReader
        AncientWriter
        AncientStater
        io.Closer
    }
    class Database {
        <<interface>>
        KeyValueStore
        AncientStore
    }
    KeyValueReader <|-- KeyValueStore
    KeyValueWriter <|-- KeyValueStore
    KeyValueStore <|-- Database
    AncientStore <|-- Database

The Database interface is simply the composition of KeyValueStore and AncientStore. This design means you can pass a KeyValueReader when you only need read access, or a full Database when you need everything. LevelDB and Pebble both implement KeyValueStore, and the ancient freezer implements AncientStore.

Tip: The memorydb package provides a full in-memory implementation of KeyValueStore — invaluable for testing. When Node runs with no data directory, it automatically uses memorydb, enabling fully ephemeral nodes for development.

The Merkle Patricia Trie Implementation

The trie.Trie struct implements Ethereum's Modified Merkle Patricia Trie — the authenticated data structure that makes state roots possible:

type Trie struct {
    root  node
    owner common.Hash
    committed   bool
    unhashed    int
    uncommitted int
    reader      *Reader
}

The trie uses four node types internally: fullNode (branch nodes with 16 children + value), shortNode (extension/leaf nodes with a key fragment), hashNode (a reference to a node stored in the database), and valueNode (raw leaf data). Operations on the trie — Get, Update, Delete — work by traversing this node tree, with hashNode references resolved lazily from the database.

Ethereum maintains two kinds of tries: the account trie (mapping addresses to account data) and storage tries (one per contract, mapping storage slots to values). The StateDB coordinates both: each stateObject holds a reference to its contract's storage trie.

flowchart TD
    STATE_ROOT["State Root Hash"] --> ACCOUNT_TRIE["Account Trie"]
    ACCOUNT_TRIE --> ACCT_A["Account A<br/>nonce, balance,<br/>storageRoot, codeHash"]
    ACCOUNT_TRIE --> ACCT_B["Account B<br/>nonce, balance,<br/>storageRoot, codeHash"]
    ACCT_A --> STORAGE_A["Storage Trie A<br/>slot → value"]
    ACCT_B --> STORAGE_B["Storage Trie B<br/>slot → value"]

When StateDB.Commit() is called at the end of block processing, all modified state objects flush their changes to their respective tries, the tries compute new root hashes, and the resulting trie nodes are collected into a node set for the TrieDB to persist.

Hash-Based vs. Path-Based Trie Storage

The triedb.Database wraps a backend interface that abstracts over two fundamentally different storage strategies:

type Database struct {
    disk      ethdb.Database
    config    *Config
    preimages *preimageStore
    backend   backend
}

The backend interface requires NodeReader, StateReader, Size, Commit, and Close. The two implementations are:

Hash-based scheme (hashdb): Trie nodes are keyed by their hash. Simple and battle-tested, but pruning is expensive — you need reference counting to know when a node is no longer reachable, and garbage collection requires traversing the entire state.

Path-based scheme (pathdb): Trie nodes are keyed by their path in the trie. This enables efficient state history — you can keep diffs between versions and prune old states by simply deleting layers. This scheme also makes state healing during snap sync more efficient.

flowchart LR
    subgraph "Hash-Based (hashdb)"
        direction TB
        H1["Node hash → node data"]
        H2["Reference counting for GC"]
        H3["Full state required for pruning"]
    end
    subgraph "Path-Based (pathdb)"
        direction TB
        P1["Trie path → node data"]
        P2["Layered diffs (disk + memory)"]
        P3["Efficient pruning by layer removal"]
    end

The selection happens in NewDatabase():

if config.PathDB != nil {
    db.backend = pathdb.New(diskdb, config.PathDB, config.IsVerkle)
} else {
    db.backend = hashdb.New(diskdb, config.HashDB)
}

The path-based scheme is the newer design, introduced to solve the growing cost of state pruning as Ethereum's state size increases. It stores state history as a stack of diff layers on top of a flattened disk layer, enabling both efficient lookups and bounded historical state access.

The Config struct makes the choice explicit:

type Config struct {
    Preimages bool
    IsVerkle  bool
    HashDB    *hashdb.Config
    PathDB    *pathdb.Config
}

Setting HashDB activates hash-based storage; setting PathDB activates path-based. Setting both is a fatal error.

State Snapshots and the Ancient Freezer

Beyond the trie itself, Geth maintains two additional storage mechanisms that dramatically improve performance.

State snapshots are a flat key-value layer that maps account addresses and storage slots directly to their values, bypassing the trie entirely for read operations. When you call eth_getBalance, the snapshot layer can answer in O(1) instead of traversing the trie. Snapshots are built incrementally as blocks are processed and are stored alongside the trie data in the key-value database.

The ancient freezer handles historical data differently. As blocks age beyond a configurable threshold, their headers, bodies, receipts, and hashes are moved from the key-value store into append-only flat files. This is critical because:

  1. Historical data is append-only — it never changes once finalized
  2. Flat files are more space-efficient than key-value stores for sequential data
  3. The key-value store stays smaller, improving performance for active state
flowchart TD
    subgraph "Active Data (LevelDB/Pebble)"
        RECENT["Recent blocks<br/>State trie nodes<br/>Transaction indices<br/>Snapshots"]
    end
    subgraph "Ancient Data (Freezer)"
        OLD_H["headers.0000"]
        OLD_B["bodies.0000"]
        OLD_R["receipts.0000"]
        OLD_HA["hashes.0000"]
    end
    subgraph "rawdb — Key Encoding Bridge"
        RAW["Prefixed key encoding<br/>Type-safe read/write<br/>Schema versioning"]
    end
    RECENT <--> RAW
    OLD_H <--> RAW
    OLD_B <--> RAW
    OLD_R <--> RAW

The core/rawdb/ package serves as the key-encoding bridge between higher-level code and the raw database. It defines key prefixes, provides type-safe accessor functions (e.g., ReadBody, WriteReceipts), and manages schema versioning. This layer ensures that the rest of the codebase never constructs database keys manually — all key layout knowledge is centralized in rawdb.

Tip: When diagnosing storage issues, the geth db inspect command (implemented in cmd/geth/dbcmd.go) walks through the entire database and reports space usage by category — trie nodes, headers, bodies, receipts, snapshots, and more. It's the quickest way to understand what's consuming disk space.

With the storage architecture mapped, we've covered everything from block arrival through state commitment to disk. The next article follows the other direction — how transactions enter the system from the P2P network, pass through the transaction pool, and eventually reach the block builder for inclusion.