Read OSS

State Management and Backends: Persistence, Locking, and Migration

Intermediate

Prerequisites

  • Article 1: Architecture and Codebase Navigation
  • Basic understanding of Terraform state concepts (terraform.tfstate, remote backends)

State Management and Backends: Persistence, Locking, and Migration

Terraform state is the bridge between your configuration's declarative intent and the actual infrastructure that exists in the world. Without state, Terraform couldn't know which real resources correspond to which configuration blocks, couldn't detect drift, and couldn't plan minimal changes. The state subsystem is correspondingly critical — and its architecture reflects that importance through careful layering.

This article explores the three-tier state model, the backend abstraction that determines where state lives, and the migration flow that handles transitions between backends during terraform init.

The In-Memory State Model

The State struct at internal/states/state.go#L27-L52 is the top-level in-memory representation:

type State struct {
    Modules          map[string]*Module
    RootOutputValues map[string]*OutputValue
    CheckResults     *CheckResults
}

The Modules map keys are module instance paths (like module.network or module.network[0]), with the root module always present. Each Module contains maps of resources, and each resource contains a map of instances (to handle count and for_each).

classDiagram
    class State {
        +Modules map[string]*Module
        +RootOutputValues map[string]*OutputValue
        +CheckResults *CheckResults
        +Empty() bool
        +Module(addr) *Module
    }
    class Module {
        +Addr ModuleInstance
        +Resources map[string]*Resource
        +OutputValues map[string]*OutputValue
    }
    class Resource {
        +Addr AbsResource
        +Instances map[InstanceKey]*ResourceInstance
        +ProviderConfig AbsProviderConfig
    }
    class ResourceInstance {
        +Current *ResourceInstanceObjectSrc
        +Deposed map[DeposedKey]*ResourceInstanceObjectSrc
    }
    State --> Module : contains
    Module --> Resource : contains
    Resource --> ResourceInstance : contains

The Deposed map on ResourceInstance deserves special mention. When Terraform needs to replace a resource with create_before_destroy, it creates the new instance first, moves the old one to a "deposed" slot, and only destroys the deposed instance after the new one is successfully created. If the apply fails between creation and destruction, the deposed instance remains in state, tracked by a random DeposedKey, until the next successful apply cleans it up.

Tip: If you see "deposed" objects in your state and wonder what they are — they're the remnants of a create_before_destroy that was interrupted. Running terraform apply again will plan to destroy them.

SyncState: Thread-Safe Access During Graph Walk

As we saw in Article 3, graph walks execute vertices in parallel. The raw State type is explicitly not concurrent-safe. The SyncState wrapper at internal/states/sync.go#L36-L40 solves this:

type SyncState struct {
    state    *State
    writable bool
    lock     sync.RWMutex
}
classDiagram
    class SyncState {
        -state *State
        -writable bool
        -lock sync.RWMutex
        +Module(addr) *Module
        +SetResourceInstanceCurrent(addr, obj, provider)
        +RemoveResourceInstanceDeposed(addr, key)
        +OutputValue(addr) *OutputValue
        +Lock() / Unlock()
    }
    class State {
        +Modules map
        +RootOutputValues map
    }
    SyncState --> State : wraps

Every read method acquires RLock() and returns a deep copy of the requested data. This is a critical safety measure — if a graph node received a reference to the actual state data and modified it without locking, data races would be inevitable. By returning copies, SyncState ensures that nodes can freely manipulate their local view without affecting other concurrent nodes.

Write methods acquire the full Lock() and modify the underlying state directly. The writable field provides an additional safety check — during plan walks, the "previous run state" and "refresh state" views are read-only, while the "planned state" view is writable.

State Managers: Persistence and Locking

The in-memory State needs to be persisted somewhere — a local file, an S3 bucket, a Consul key, etc. The statemgr package defines a layered interface hierarchy:

The Filesystem state manager at internal/states/statemgr/filesystem.go#L29-L64 is the local implementation:

type Filesystem struct {
    mu           sync.Mutex
    path         string
    readPath     string
    backupPath   string
    stateFileOut *os.File
    lockID       string
    created      bool
    file         *statefile.File
    readFile     *statefile.File
    backupFile   *statefile.File
    writtenBackup bool
}

It implements the Full interface — which combines Reader, Writer, Refresher, Persister, and Locker. The separation matters: Reader/Writer deal with the transient in-memory copy, while Refresher reads from disk and Persister writes to disk. The Locker interface adds Lock()/Unlock() for preventing concurrent access from multiple Terraform processes.

classDiagram
    class Reader {
        <<interface>>
        +State() *State
    }
    class Writer {
        <<interface>>
        +WriteState(*State) error
    }
    class Refresher {
        <<interface>>
        +RefreshState() error
    }
    class Persister {
        <<interface>>
        +PersistState(schemas) error
    }
    class Locker {
        <<interface>>
        +Lock(LockInfo) (string, error)
        +Unlock(string) error
    }
    class Full {
        <<interface>>
    }
    Reader <|-- Full
    Writer <|-- Full
    Refresher <|-- Full
    Persister <|-- Full
    Locker <|-- Full
    Full <|.. Filesystem

For the local backend, locking uses OS-level file locks. For remote backends like S3, locking might use a DynamoDB table. The abstraction ensures Terraform Core never needs to know the mechanism.

The Backend Interface Hierarchy

The backend abstraction lives in internal/backend/backend.go#L44-L106:

type Backend interface {
    ConfigSchema() *configschema.Block
    PrepareConfig(cty.Value) (cty.Value, tfdiags.Diagnostics)
    Configure(cty.Value) tfdiags.Diagnostics
    StateMgr(workspace string) (statemgr.Full, tfdiags.Diagnostics)
    DeleteWorkspace(name string, force bool) tfdiags.Diagnostics
    Workspaces() ([]string, tfdiags.Diagnostics)
}

This is the base interface — every backend must know how to store and retrieve state. But there's a second, more powerful interface defined in internal/backend/backendrun/operation.go#L38-L53:

type OperationsBackend interface {
    backend.Backend
    Operation(context.Context, *Operation) (*RunningOperation, error)
    ServiceDiscoveryAliases() ([]HostAlias, error)
}

This distinction is architecturally significant. Only two backends implement OperationsBackend: local and cloud (HCP Terraform). All other backends — S3, GCS, AzureRM, Consul, etc. — implement only Backend. When you use one of those backends, Terraform wraps it in local.Local which provides the Operation() method, running plan/apply locally while storing state remotely.

flowchart TD
    subgraph "OperationsBackend"
        Local["local backend<br/>(runs operations locally)"]
        Cloud["cloud backend<br/>(runs operations remotely)"]
    end
    subgraph "Backend only (state storage)"
        S3["s3"]
        GCS["gcs"]
        Azure["azurerm"]
        Consul["consul"]
        Others["pg, http, cos, oss,<br/>kubernetes, oci, inmem"]
    end
    S3 -->|"wrapped in"| Local
    GCS -->|"wrapped in"| Local
    Azure -->|"wrapped in"| Local
    Consul -->|"wrapped in"| Local
    Others -->|"wrapped in"| Local

Built-in Backends and Registration

All backends are hardcoded in internal/backend/init/init.go#L52-L76:

backends = map[string]backend.InitFn{
    "local":      func() backend.Backend { return backendLocal.New() },
    "remote":     func() backend.Backend { return backendRemote.New(services) },
    "azurerm":    func() backend.Backend { return backendAzure.New() },
    "consul":     func() backend.Backend { return backendConsul.New() },
    "cos":        func() backend.Backend { return backendCos.New() },
    "gcs":        func() backend.Backend { return backendGCS.New() },
    "http":       func() backend.Backend { return backendHTTP.New() },
    "inmem":      func() backend.Backend { return backendInmem.New() },
    "kubernetes": func() backend.Backend { return backendKubernetes.New() },
    "oss":        func() backend.Backend { return backendOSS.New() },
    "pg":         func() backend.Backend { return backendPg.New() },
    "s3":         func() backend.Backend { return backendS3.New() },
    "oci":        func() backend.Backend { return backendOCI.New() },
    "cloud":      func() backend.Backend { return backendCloud.New(services) },
}

The comment above this map (lines 33-43) explains why backends aren't pluggable:

Backends are hardcoded into Terraform because the API for backends uses complex structures and supporting that over the plugin system is currently prohibitively difficult. For those wanting to implement a custom backend, they can do so with recompilation.

This is a pragmatic choice. Backends need access to deeply internal types like statemgr.Full, configschema.Block, and tfdiags.Diagnostics. Exposing these across a plugin boundary would require a protocol as complex as the provider protocol, with questionable benefit since the list of backends changes infrequently.

The Init() function is called once during boot at main.go#L212, populating the global backends map. There's also a RemovedBackends map for backends that have been deprecated (like artifactory, etcd, swift), which triggers helpful error messages during init.

Backend Initialization and State Migration

When terraform init runs, one of its most complex jobs is backend configuration. The flow starts in InitCommand and delegates to the backend-handling code in command/meta_backend.go.

The initialization handles several scenarios:

  1. First-time setup — no backend was previously configured
  2. Same backend, changed config — e.g., switching S3 buckets
  3. Different backend — e.g., migrating from local to S3
  4. Backend removal — going back to local storage
sequenceDiagram
    participant User
    participant Init as InitCommand
    participant Meta as meta_backend
    participant OldBE as Old Backend
    participant NewBE as New Backend

    User->>Init: terraform init
    Init->>Meta: BackendForPlan / configureBackend
    Meta->>Meta: Detect backend change
    alt Backend changed
        Meta->>OldBE: StateMgr("default")
        OldBE-->>Meta: old state manager
        Meta->>OldBE: RefreshState()
        Meta->>NewBE: Configure(newConfig)
        Meta->>NewBE: StateMgr("default")
        NewBE-->>Meta: new state manager
        Meta->>User: "Do you want to migrate state?"
        User-->>Meta: "yes"
        Meta->>NewBE: WriteState(oldState)
        Meta->>NewBE: PersistState()
    end
    Meta-->>Init: configured backend

State migration is interactive by default — Terraform prompts the user for confirmation before moving state between backends. In automation (TF_IN_AUTOMATION=1), these prompts can be answered via -input=false and pre-configured variables.

Tip: When migrating backends, Terraform locks both the old and new state to prevent concurrent access during the migration. If a lock can't be acquired, the migration fails safely without data loss.

What's Ahead

State and backends are the persistence layer; the CLI is the presentation layer. Article 6 examines how commands are structured around the shared Meta, how the views layer separates rendering logic from business logic, and how the diagnostic system replaces conventional Go error handling with rich, source-attributed messages.