State Management and Backends: Persistence, Locking, and Migration
Prerequisites
- ›Article 1: Architecture and Codebase Navigation
- ›Basic understanding of Terraform state concepts (terraform.tfstate, remote backends)
State Management and Backends: Persistence, Locking, and Migration
Terraform state is the bridge between your configuration's declarative intent and the actual infrastructure that exists in the world. Without state, Terraform couldn't know which real resources correspond to which configuration blocks, couldn't detect drift, and couldn't plan minimal changes. The state subsystem is correspondingly critical — and its architecture reflects that importance through careful layering.
This article explores the three-tier state model, the backend abstraction that determines where state lives, and the migration flow that handles transitions between backends during terraform init.
The In-Memory State Model
The State struct at internal/states/state.go#L27-L52 is the top-level in-memory representation:
type State struct {
Modules map[string]*Module
RootOutputValues map[string]*OutputValue
CheckResults *CheckResults
}
The Modules map keys are module instance paths (like module.network or module.network[0]), with the root module always present. Each Module contains maps of resources, and each resource contains a map of instances (to handle count and for_each).
classDiagram
class State {
+Modules map[string]*Module
+RootOutputValues map[string]*OutputValue
+CheckResults *CheckResults
+Empty() bool
+Module(addr) *Module
}
class Module {
+Addr ModuleInstance
+Resources map[string]*Resource
+OutputValues map[string]*OutputValue
}
class Resource {
+Addr AbsResource
+Instances map[InstanceKey]*ResourceInstance
+ProviderConfig AbsProviderConfig
}
class ResourceInstance {
+Current *ResourceInstanceObjectSrc
+Deposed map[DeposedKey]*ResourceInstanceObjectSrc
}
State --> Module : contains
Module --> Resource : contains
Resource --> ResourceInstance : contains
The Deposed map on ResourceInstance deserves special mention. When Terraform needs to replace a resource with create_before_destroy, it creates the new instance first, moves the old one to a "deposed" slot, and only destroys the deposed instance after the new one is successfully created. If the apply fails between creation and destruction, the deposed instance remains in state, tracked by a random DeposedKey, until the next successful apply cleans it up.
Tip: If you see "deposed" objects in your state and wonder what they are — they're the remnants of a
create_before_destroythat was interrupted. Runningterraform applyagain will plan to destroy them.
SyncState: Thread-Safe Access During Graph Walk
As we saw in Article 3, graph walks execute vertices in parallel. The raw State type is explicitly not concurrent-safe. The SyncState wrapper at internal/states/sync.go#L36-L40 solves this:
type SyncState struct {
state *State
writable bool
lock sync.RWMutex
}
classDiagram
class SyncState {
-state *State
-writable bool
-lock sync.RWMutex
+Module(addr) *Module
+SetResourceInstanceCurrent(addr, obj, provider)
+RemoveResourceInstanceDeposed(addr, key)
+OutputValue(addr) *OutputValue
+Lock() / Unlock()
}
class State {
+Modules map
+RootOutputValues map
}
SyncState --> State : wraps
Every read method acquires RLock() and returns a deep copy of the requested data. This is a critical safety measure — if a graph node received a reference to the actual state data and modified it without locking, data races would be inevitable. By returning copies, SyncState ensures that nodes can freely manipulate their local view without affecting other concurrent nodes.
Write methods acquire the full Lock() and modify the underlying state directly. The writable field provides an additional safety check — during plan walks, the "previous run state" and "refresh state" views are read-only, while the "planned state" view is writable.
State Managers: Persistence and Locking
The in-memory State needs to be persisted somewhere — a local file, an S3 bucket, a Consul key, etc. The statemgr package defines a layered interface hierarchy:
The Filesystem state manager at internal/states/statemgr/filesystem.go#L29-L64 is the local implementation:
type Filesystem struct {
mu sync.Mutex
path string
readPath string
backupPath string
stateFileOut *os.File
lockID string
created bool
file *statefile.File
readFile *statefile.File
backupFile *statefile.File
writtenBackup bool
}
It implements the Full interface — which combines Reader, Writer, Refresher, Persister, and Locker. The separation matters: Reader/Writer deal with the transient in-memory copy, while Refresher reads from disk and Persister writes to disk. The Locker interface adds Lock()/Unlock() for preventing concurrent access from multiple Terraform processes.
classDiagram
class Reader {
<<interface>>
+State() *State
}
class Writer {
<<interface>>
+WriteState(*State) error
}
class Refresher {
<<interface>>
+RefreshState() error
}
class Persister {
<<interface>>
+PersistState(schemas) error
}
class Locker {
<<interface>>
+Lock(LockInfo) (string, error)
+Unlock(string) error
}
class Full {
<<interface>>
}
Reader <|-- Full
Writer <|-- Full
Refresher <|-- Full
Persister <|-- Full
Locker <|-- Full
Full <|.. Filesystem
For the local backend, locking uses OS-level file locks. For remote backends like S3, locking might use a DynamoDB table. The abstraction ensures Terraform Core never needs to know the mechanism.
The Backend Interface Hierarchy
The backend abstraction lives in internal/backend/backend.go#L44-L106:
type Backend interface {
ConfigSchema() *configschema.Block
PrepareConfig(cty.Value) (cty.Value, tfdiags.Diagnostics)
Configure(cty.Value) tfdiags.Diagnostics
StateMgr(workspace string) (statemgr.Full, tfdiags.Diagnostics)
DeleteWorkspace(name string, force bool) tfdiags.Diagnostics
Workspaces() ([]string, tfdiags.Diagnostics)
}
This is the base interface — every backend must know how to store and retrieve state. But there's a second, more powerful interface defined in internal/backend/backendrun/operation.go#L38-L53:
type OperationsBackend interface {
backend.Backend
Operation(context.Context, *Operation) (*RunningOperation, error)
ServiceDiscoveryAliases() ([]HostAlias, error)
}
This distinction is architecturally significant. Only two backends implement OperationsBackend: local and cloud (HCP Terraform). All other backends — S3, GCS, AzureRM, Consul, etc. — implement only Backend. When you use one of those backends, Terraform wraps it in local.Local which provides the Operation() method, running plan/apply locally while storing state remotely.
flowchart TD
subgraph "OperationsBackend"
Local["local backend<br/>(runs operations locally)"]
Cloud["cloud backend<br/>(runs operations remotely)"]
end
subgraph "Backend only (state storage)"
S3["s3"]
GCS["gcs"]
Azure["azurerm"]
Consul["consul"]
Others["pg, http, cos, oss,<br/>kubernetes, oci, inmem"]
end
S3 -->|"wrapped in"| Local
GCS -->|"wrapped in"| Local
Azure -->|"wrapped in"| Local
Consul -->|"wrapped in"| Local
Others -->|"wrapped in"| Local
Built-in Backends and Registration
All backends are hardcoded in internal/backend/init/init.go#L52-L76:
backends = map[string]backend.InitFn{
"local": func() backend.Backend { return backendLocal.New() },
"remote": func() backend.Backend { return backendRemote.New(services) },
"azurerm": func() backend.Backend { return backendAzure.New() },
"consul": func() backend.Backend { return backendConsul.New() },
"cos": func() backend.Backend { return backendCos.New() },
"gcs": func() backend.Backend { return backendGCS.New() },
"http": func() backend.Backend { return backendHTTP.New() },
"inmem": func() backend.Backend { return backendInmem.New() },
"kubernetes": func() backend.Backend { return backendKubernetes.New() },
"oss": func() backend.Backend { return backendOSS.New() },
"pg": func() backend.Backend { return backendPg.New() },
"s3": func() backend.Backend { return backendS3.New() },
"oci": func() backend.Backend { return backendOCI.New() },
"cloud": func() backend.Backend { return backendCloud.New(services) },
}
The comment above this map (lines 33-43) explains why backends aren't pluggable:
Backends are hardcoded into Terraform because the API for backends uses complex structures and supporting that over the plugin system is currently prohibitively difficult. For those wanting to implement a custom backend, they can do so with recompilation.
This is a pragmatic choice. Backends need access to deeply internal types like statemgr.Full, configschema.Block, and tfdiags.Diagnostics. Exposing these across a plugin boundary would require a protocol as complex as the provider protocol, with questionable benefit since the list of backends changes infrequently.
The Init() function is called once during boot at main.go#L212, populating the global backends map. There's also a RemovedBackends map for backends that have been deprecated (like artifactory, etcd, swift), which triggers helpful error messages during init.
Backend Initialization and State Migration
When terraform init runs, one of its most complex jobs is backend configuration. The flow starts in InitCommand and delegates to the backend-handling code in command/meta_backend.go.
The initialization handles several scenarios:
- First-time setup — no backend was previously configured
- Same backend, changed config — e.g., switching S3 buckets
- Different backend — e.g., migrating from local to S3
- Backend removal — going back to local storage
sequenceDiagram
participant User
participant Init as InitCommand
participant Meta as meta_backend
participant OldBE as Old Backend
participant NewBE as New Backend
User->>Init: terraform init
Init->>Meta: BackendForPlan / configureBackend
Meta->>Meta: Detect backend change
alt Backend changed
Meta->>OldBE: StateMgr("default")
OldBE-->>Meta: old state manager
Meta->>OldBE: RefreshState()
Meta->>NewBE: Configure(newConfig)
Meta->>NewBE: StateMgr("default")
NewBE-->>Meta: new state manager
Meta->>User: "Do you want to migrate state?"
User-->>Meta: "yes"
Meta->>NewBE: WriteState(oldState)
Meta->>NewBE: PersistState()
end
Meta-->>Init: configured backend
State migration is interactive by default — Terraform prompts the user for confirmation before moving state between backends. In automation (TF_IN_AUTOMATION=1), these prompts can be answered via -input=false and pre-configured variables.
Tip: When migrating backends, Terraform locks both the old and new state to prevent concurrent access during the migration. If a lock can't be acquired, the migration fails safely without data loss.
What's Ahead
State and backends are the persistence layer; the CLI is the presentation layer. Article 6 examines how commands are structured around the shared Meta, how the views layer separates rendering logic from business logic, and how the diagnostic system replaces conventional Go error handling with rich, source-attributed messages.