Read OSS

The Graph Engine: How Terraform Builds and Walks Dependency Graphs

Advanced

Prerequisites

  • Article 1: Architecture and Codebase Navigation
  • Understanding of DAGs, topological sort, and parallel execution concepts
  • Go concurrency primitives (goroutines, channels, sync.WaitGroup)

The Graph Engine: How Terraform Builds and Walks Dependency Graphs

Terraform's most distinctive architectural decision is that every operation — plan, apply, validate, import — is modeled as a walk over a directed acyclic graph. Resources that don't depend on each other execute in parallel. Resources that do depend on each other execute in topological order. This single abstraction enables Terraform to safely manage thousands of infrastructure resources in a single run.

The graph engine lives in two packages: internal/dag/ provides a generic, Terraform-agnostic graph library, while internal/terraform/ wraps it with domain-specific vertex types and a pipeline of graph transformers. Understanding how these two layers interact is essential for debugging plan behavior, understanding parallelism, and reading the core engine code.

The DAG Package: A Generic Graph Library

The foundation is AcyclicGraph in internal/dag/dag.go#L15-L18:

type AcyclicGraph struct {
    Graph
}

AcyclicGraph embeds the base Graph type and adds methods that assume (and verify) the absence of cycles. Vertices are interface{} — the DAG package has no knowledge of Terraform resources, providers, or any domain concepts. Edges are directional, representing dependencies.

classDiagram
    class Graph {
        +vertices Set
        +edges Set
        +Add(Vertex)
        +Remove(Vertex)
        +Connect(Edge)
        +DownEdges(Vertex) Set
        +UpEdges(Vertex) Set
    }
    class AcyclicGraph {
        +Ancestors(vs) Set
        +TransitiveReduction()
        +Validate() error
        +Walk(cb WalkFunc) Diagnostics
    }
    class Walker {
        +Callback WalkFunc
        +Reverse bool
        +Update(g)
        +Wait() Diagnostics
    }
    Graph <|-- AcyclicGraph
    AcyclicGraph ..> Walker : uses

The key operations that Terraform relies on:

  • Ancestors() — finds all transitive dependencies of a vertex, used for targeting
  • TransitiveReduction() — removes redundant edges to simplify the graph without changing execution order
  • Validate() — checks for cycles and self-referencing edges
  • Walk() — the high-level entry point that creates a Walker and runs it

This deliberate decoupling means the DAG package can be (and is) tested independently with simple string vertices, without pulling in any Terraform-specific types.

Parallel Walk: Goroutines, Channels, and Dependency Signaling

The Walker struct in internal/dag/walk.go#L39-L68 is the parallel execution engine:

type Walker struct {
    Callback   WalkFunc
    Reverse    bool
    changeLock sync.Mutex
    vertices   Set
    edges      Set
    vertexMap  map[Vertex]*walkerVertex
    wait       sync.WaitGroup
    diagsMap       map[Vertex]tfdiags.Diagnostics
    upstreamFailed map[Vertex]struct{}
    diagsLock      sync.Mutex
}

For each vertex, the Walker creates a walkerVertex (lines 88-119) with two critical channels:

  • DoneCh — closed when this vertex finishes execution (success or failure)
  • DepsCh — receives a boolean when all upstream dependencies complete; true means all succeeded, false means at least one failed
flowchart TD
    subgraph "Vertex A (no deps)"
        A_exec["Execute callback"] --> A_done["Close DoneCh"]
    end
    subgraph "Vertex B (depends on A)"
        B_wait["Wait on A.DoneCh"] --> B_deps["Receive on DepsCh"]
        B_deps -->|"true: deps OK"| B_exec["Execute callback"]
        B_deps -->|"false: upstream failed"| B_skip["Skip execution"]
        B_exec --> B_done["Close DoneCh"]
        B_skip --> B_done
    end
    A_done -.->|"signals"| B_wait

The Walker spawns two goroutines per vertex: one that waits on dependency channels and sends the aggregate result on DepsCh, and one that receives from DepsCh and runs the callback. As the comment notes, this creates V*2 goroutines total.

When an upstream vertex fails, all downstream vertices receive false on their DepsCh and are recorded in upstreamFailed. Their errors are excluded from the final diagnostic set because they're considered cascading failures — a thoughtful touch that prevents error-message avalanches.

Tip: Terraform defaults to a parallelism of 10 (set in NewContext at internal/terraform/context.go#L146-L148). This doesn't limit goroutines — all vertices still get goroutines — but limits how many vertices can execute their callback concurrently, via a semaphore. This protects against provider rate limits.

Terraform's Graph Wrapper and Walk Dispatch

The internal/terraform/ package wraps the generic DAG with its own Graph type at internal/terraform/graph.go#L19-L28:

type Graph struct {
    dag.AcyclicGraph
    Path addrs.ModuleInstance
}

This embedding gives Graph all the DAG methods while adding a module path and the crucial Walk() method (lines 37-39) that bridges to the GraphWalker interface.

The walk() function (lines 41-120+) is where domain-specific dispatch happens. For each vertex visited by the DAG walker, it:

  1. Checks if the vertex is GraphNodeOverridable (for testing framework overrides)
  2. Determines the evaluation context scope (global, module instance, or partial-expanded module)
  3. Dispatches to GraphNodeExecutable.Execute() if the vertex implements it

The vertex interfaces form a rich type system:

classDiagram
    class GraphNodeExecutable {
        <<interface>>
        +Execute(EvalContext, walkOperation) Diagnostics
    }
    class GraphNodeReferenceable {
        <<interface>>
        +ReferenceableAddrs() []Referenceable
    }
    class GraphNodeReferencer {
        <<interface>>
        +References() []*Reference
    }
    class GraphNodeModuleInstance {
        <<interface>>
        +Path() ModuleInstance
    }
    class GraphNodeConfigResource {
        <<interface>>
        +ResourceAddr() ConfigResource
    }

A single node type like NodePlannableResourceInstance implements many of these interfaces simultaneously. The graph walk code uses Go type assertions to discover what each vertex can do, providing a flexible dispatch mechanism without requiring a monolithic node interface.

BasicGraphBuilder: The Transform Pipeline

Graphs are constructed through a pipeline of mutations. The BasicGraphBuilder at internal/terraform/graph_builder.go#L26-L34 holds a slice of GraphTransformer steps:

type BasicGraphBuilder struct {
    Steps []GraphTransformer
    Name  string
    SkipGraphValidation bool
}

The Build() method (lines 36-90) iterates this slice, calling Transform(g) on each:

for _, step := range b.Steps {
    if step == nil { continue }
    err := step.Transform(g)
    // ...handle errors...
}
sequenceDiagram
    participant Builder as BasicGraphBuilder
    participant G as Graph
    participant T1 as ConfigTransformer
    participant T2 as ReferenceTransformer
    participant T3 as TransitiveReductionTransformer

    Builder->>T1: Transform(g)
    Note over T1,G: Adds resource vertices from config
    T1-->>G: mutated
    Builder->>T2: Transform(g)
    Note over T2,G: Wires dependency edges from HCL refs
    T2-->>G: mutated
    Builder->>T3: Transform(g)
    Note over T3,G: Removes redundant edges
    T3-->>G: mutated
    Builder->>G: Validate()

Each transformer mutates the graph in place — adding vertices, adding edges, removing vertices, or modifying existing nodes. This pipeline-of-mutations pattern is powerful because transformers are composable: you can add or remove steps without changing the others. It also means the graph's structure at any point depends on the order of previous transformers, which is why the Steps() method on each graph builder is carefully ordered.

After all transforms complete, Build() calls g.Validate() to check for cycles. If validation fails, Terraform logs the graph structure and returns an error.

PlanGraphBuilder: Anatomy of the Plan Graph

The PlanGraphBuilder at internal/terraform/graph_builder_plan.go#L30-L135 produces the graph used for planning. Its Steps() method (lines 148-328) returns over 20 transformers. Here are the most important ones in order:

flowchart TD
    CT["ConfigTransformer<br/>Add resource vertices from config"] --> RVT["RootVariableTransformer<br/>Add input variable nodes"]
    RVT --> MVT["ModuleVariableTransformer<br/>Add module variable nodes"]
    MVT --> LT["LocalTransformer<br/>Add local value nodes"]
    LT --> OT["OutputTransformer<br/>Add output value nodes"]
    OT --> ORIT["OrphanResourceInstanceTransformer<br/>Add nodes for state resources missing from config"]
    ORIT --> ST["StateTransformer<br/>Add deposed instance nodes"]
    ST --> AST["AttachStateTransformer<br/>Attach state to resource nodes"]
    AST --> ARC["AttachResourceConfigTransformer<br/>Attach config to resource nodes"]
    ARC --> PT["ProviderTransformer<br/>Wire provider dependencies"]
    PT --> SchemaT["AttachSchemaTransformer<br/>Attach schemas to nodes"]
    SchemaT --> MET["ModuleExpansionTransformer<br/>Handle count/for_each"]
    MET --> RT["ReferenceTransformer<br/>Wire dependency edges from HCL refs"]
    RT --> TT["TargetsTransformer<br/>Prune to targeted resources"]
    TT --> CPRT["CloseProviderTransformer<br/>Add provider close nodes"]
    CPRT --> TRT["TransitiveReductionTransformer<br/>Simplify graph"]

The ConfigTransformer at internal/terraform/transform_config.go#L33-L73 is typically the first substantive transformer. It walks the config tree and adds a vertex for each resource declaration. But note: at this stage, count and for_each haven't been evaluated yet, so each vertex represents a configuration-level resource, not individual instances. Instance expansion happens later via ModuleExpansionTransformer and dynamic expansion during the walk.

The ReferenceTransformer at internal/terraform/transform_reference.go#L19-L43 is where dependency edges get wired. It examines which addresses each node references (via GraphNodeReferencer) and which addresses each node provides (via GraphNodeReferenceable), then connects them with edges. This is how var.name in a resource's configuration creates a dependency on the variable node, and how aws_instance.web.id creates a dependency on the aws_instance.web resource node.

ApplyGraphBuilder: Differences from Plan

The ApplyGraphBuilder at internal/terraform/graph_builder_apply.go#L26-L93 takes a fundamentally different input: the plan's change set rather than the configuration alone. Its Steps() method (lines 105-200+) introduces the DiffTransformer:

&DiffTransformer{
    Concrete: concreteResourceInstance,
    State:    b.State,
    Changes:  b.Changes,
    Config:   b.Config,
},

Where ConfigTransformer adds vertices from configuration, DiffTransformer adds vertices from the planned changes. This ensures the apply graph only includes resources that actually need modification — a critical safety property.

The apply graph also uses different concrete node types. Where the plan graph uses NodePlannableResourceInstance, the apply graph uses NodeApplyableResourceInstance (line 119-124), which calls provider.ApplyResourceChange() instead of PlanResourceChange().

Both builders share many of the same transformers — ReferenceTransformer, ProviderTransformer, TransitiveReductionTransformer — because the fundamental need for dependency ordering and graph simplification is the same.

Walk Operations: Eight Flavors of Graph Walk

The same graph infrastructure supports eight distinct walk types, enumerated in internal/terraform/graph_walk_operation.go#L9-L21:

const (
    walkInvalid walkOperation = iota
    walkApply
    walkPlan
    walkPlanDestroy
    walkValidate
    walkDestroy
    walkImport
    walkEval
    walkInit
)

The PlanGraphBuilder.Steps() method switches on the operation type to select the right initialization (lines 149-160). For walkPlan, it calls initPlan() which sets up concrete node factories for plannable resources. For walkPlanDestroy, it calls initDestroy() which configures the graph for destruction ordering.

This design means the same transformer pipeline can produce dramatically different graphs depending on the operation, while sharing most of the structural logic.

What's Ahead

Now that you understand how graphs are built and walked, Article 3 will follow a single resource instance through the complete plan-and-apply lifecycle — from Context.Plan() creating the graph, through NodePlannableResourceInstance.Execute() calling the provider, to Context.Apply() making real infrastructure changes. We'll also examine the EvalContext interface that provides the bridge between graph nodes and the outside world.