Read OSS

Getting Your Hands Dirty: Building, Testing, and Contributing to Geth

Intermediate

Prerequisites

  • All previous articles in the series
  • Go development environment setup
  • Git basics

Getting Your Hands Dirty: Building, Testing, and Contributing to Geth

Over the past six articles, we've traced every major subsystem of go-ethereum — from the CLI boot sequence through block execution, state storage, transaction pools, P2P networking, and the RPC layer. Now it's time for the practical chapter: how to build the project, run its multi-layered test suite, understand the code generation patterns, and — most usefully — how new hard forks get implemented. Whether you're fixing a bug, adding a feature, or just trying to understand a specific behavior, this article gives you the tools to work with the codebase effectively.

Build Pipeline: Makefile and build/ci.go

As we briefly covered in Part 1, Geth uses a two-tier build pipeline. The Makefile is the developer-facing interface:

make geth        # Build just the geth binary
make evm         # Build the standalone EVM tool
make all         # Build all packages and executables
make test        # Run tests (builds first)
make lint        # Run linters
make fmt         # Format all Go code
make devtools    # Install code generation tools

Every make target ultimately calls go run build/ci.go <command>. The build/ci.go file is a Go program with //go:build none — it's never compiled as part of the module but is executed as a script. It handles:

  • install — Cross-platform compilation with architecture and compiler selection
  • test — Test execution with coverage support
  • lint — Pre-selected linter configuration
  • check_generate — Verifies generated code is up-to-date
  • check_baddeps — Ensures forbidden dependencies aren't introduced
  • archive / debsrc / nsis — Packaging for distribution
flowchart TD
    DEV["Developer"] -->|make geth| MAKE["Makefile"]
    MAKE --> CI["go run build/ci.go install ./cmd/geth"]
    CI --> BUILD["go build -o build/bin/geth ./cmd/geth"]
    BUILD --> BIN["build/bin/geth"]

    DEV -->|make test| MAKE
    MAKE --> CI2["go run build/ci.go test"]
    CI2 --> TEST["go test ./..."]

    DEV -->|make devtools| MAKE
    MAKE --> TOOLS["Install stringer, gencodec,<br/>protoc-gen-go, abigen"]

Tip: For day-to-day development, make geth is all you need. The binary appears at ./build/bin/geth. For CI or release builds, the build/ci.go script handles cross-compilation, signing, and packaging.

Testing Strategy: Unit, Integration, and Reference Tests

Geth employs a multi-layered testing strategy. The AGENTS.md file codifies the expected workflow:

flowchart TD
    subgraph "During Development"
        SHORT["go run ./build/ci.go test -short<br/>Fast feedback, skips slow tests"]
    end
    subgraph "Before Commit"
        FULL["go run ./build/ci.go test<br/>Full suite including reference tests"]
        LINT["go run ./build/ci.go lint<br/>Style checks"]
        GEN["go run ./build/ci.go check_generate<br/>Generated code up-to-date"]
        DEPS["go run ./build/ci.go check_baddeps<br/>Dependency hygiene"]
    end
    SHORT -->|iterate| SHORT
    SHORT -->|ready to commit| FULL
    FULL --> LINT
    LINT --> GEN
    GEN --> DEPS

The testing layers are:

  1. Unit tests — Standard Go _test.go files alongside the code they test. Most packages have comprehensive unit tests. Use go test ./core/vm/... to test a specific package.

  2. Integration tests — Tests that exercise multiple packages together, often using the in-memory database backend. The eth/ package has tests that wire up a complete handler with simulated peers.

  3. Ethereum reference tests — The tests/ directory contains the official Ethereum execution specification test suite. These tests validate that Geth's EVM produces the exact same results as the reference specification across every fork. They test state transitions, block processing, transaction validation, and RLP encoding.

  4. The cmd/evm tool — A standalone EVM that can execute state tests, trace transactions, and benchmark opcodes in isolation. It's invaluable for debugging EVM issues without running a full node.

The core/vm/runtime/ package provides a testing runtime for isolated EVM execution — you can create an EVM with a synthetic state, execute arbitrary bytecode, and inspect the results. Many internal tests use this pattern:

// Example pattern from core/vm tests
result, _, err := runtime.Execute(code, input, &runtime.Config{
    GasLimit: 1000000,
    // ... configuration
})

Code Generation Patterns

Geth uses go:generate directives for three main purposes:

  1. gencodec — Generates type-safe JSON marshaling code. Many types in core/types/ use gencodec to produce gen_*.go files that handle JSON encoding without runtime reflection. The directive in ethconfig/config.go is typical:
//go:generate go run github.com/fjl/gencodec -type Config -formats toml -out gen_config.go
  1. stringer — Generates String() methods for enum types. The make devtools target installs this tool.

  2. protoc-gen-go — Protocol buffer code generation for any proto-defined types.

The build system's check_generate command ensures all generated files are up-to-date. If you modify a type that has a go:generate directive, you need to:

make devtools          # Install generators (first time only)
go generate ./...      # Regenerate all files
flowchart LR
    SOURCE["Source type<br/>(e.g., Config struct)"] -->|go:generate directive| GENCODEC["gencodec"]
    GENCODEC --> GENERATED["gen_config.go<br/>Type-safe marshaling"]
    SOURCE2["Enum type<br/>(e.g., SyncMode)"] -->|go:generate directive| STRINGER["stringer"]
    STRINGER --> GENERATED2["syncmode_string.go<br/>String() method"]

Auxiliary Tools and Executables

Beyond geth itself, the cmd/ directory contains several useful tools:

Tool Purpose
cmd/evm Standalone EVM — run state tests, trace transactions, benchmark
cmd/devp2p P2P protocol testing — ENR operations, discovery crawling, protocol tests
cmd/clef External signer — manages keys outside the Geth process
cmd/abigen ABI binding generator — creates type-safe Go wrappers for contracts
cmd/rlpdump RLP inspector — decode and display RLP-encoded data
cmd/era Era1 archive tool — work with era1 archive files
cmd/blsync Beacon light client sync — lightweight CL synchronization

The devp2p tool is particularly useful for network debugging. It can crawl the discovery network, test protocol handshakes, and validate ENR records. If you're working on P2P code, this is your testing companion.

How Forks Get Implemented

The established pattern for implementing a new Ethereum hard fork is one of the most instructive ways to understand Geth's architecture. It touches nearly every subsystem we've covered. Here's the recipe:

flowchart TD
    A["1. Add activation field to ChainConfig<br/>(params/config.go)"] --> B["2. Add Rules flag<br/>(params/config.go → Rules struct)"]
    B --> C["3. Create new instruction set<br/>(core/vm/jump_table.go)"]
    C --> D["4. Implement EIP enable functions<br/>(core/vm/eips.go)"]
    D --> E["5. Add fork-specific logic<br/>(core/state_processor.go,<br/>consensus/, etc.)"]
    E --> F["6. Update jump table selection<br/>(core/vm/evm.go → NewEVM)"]
    F --> G["7. Update reference tests<br/>(tests/)"]
    G --> H["8. Add override flag<br/>(cmd/geth/main.go)"]

Step 1: Add a time-based activation field to ChainConfig. Post-Merge forks use *uint64 timestamps (e.g., OsakaTime *uint64, AmsterdamTime *uint64). Pre-Merge forks used block numbers.

Step 2: Add a boolean flag to the Rules struct (e.g., IsOsaka bool, IsAmsterdam bool). The Rules are computed from ChainConfig at a specific block number and timestamp.

Step 3: Create a new instruction set constructor in jump_table.go. Copy the previous fork's table and add new opcodes:

func newAmsterdamInstructionSet() JumpTable {
    instructionSet := newOsakaInstructionSet()
    enable7843(&instructionSet) // SLOTNUM opcode
    enable8024(&instructionSet) // SWAPN, DUPN, EXCHANGE
    return validate(instructionSet)
}

Step 4: Implement the enable* functions that modify specific entries in the jump table — setting execution functions, gas costs, and stack parameters for new or modified opcodes.

Step 5: Add any non-EVM fork logic — new system contracts, modified state transition rules, consensus changes, new transaction types.

Step 6: Add the new fork to the switch statement in NewEVM(). The newest fork is always checked first.

Step 7: Update the reference test suite to include the new fork's expected behavior.

Step 8: Add an --override.<forkname> CLI flag for testing the fork before mainnet activation.

Tip: The AGENTS.md file at the repo root contains contributor guidelines, including commit message format (<package>: description), the pre-commit checklist, and the pull request title convention. Read this before your first PR.

Orientation Tips for Exploring Further

After seven articles, you have a complete mental model of Geth. Here are some parting navigation heuristics:

  1. Follow the interfaces. When you encounter a method call on an interface, find the concrete implementation by searching for the struct that satisfies it. Go's implicit interface satisfaction means grep is often more effective than your IDE's "find implementations" feature.

  2. Start from eth/backend.go. The Ethereum struct and New() constructor are the Rosetta Stone. Every major subsystem is created and wired here. When in doubt about how two components connect, check eth.New().

  3. Use the rawdb package as your database dictionary. If you need to know what's stored in the database and how keys are structured, core/rawdb/ has the answers.

  4. The params package is the source of truth for protocol constants. Gas costs, fork activation logic, chain IDs, precompile addresses — it's all in params/.

  5. Tests are documentation. When the code comments don't explain a behavior, the tests often do. Look for _test.go files in the same package.

  6. Commit messages are changelogs. Geth's commit history follows a strict <package>: description format. Use git log --oneline -- core/vm/ to understand the evolution of any subsystem.

The go-ethereum codebase rewards careful reading. It's remarkably well-structured for its size and age, and the interface-driven design means you can understand any subsystem in isolation. Now that you have the map, go explore.