Read OSS

The Provider Plugin System: gRPC, Protocols, and Provider Lifecycle

Advanced

Prerequisites

  • Articles 1–3 (Architecture, Graph Engine, Plan & Apply)
  • Basic understanding of gRPC and Protocol Buffers
  • Familiarity with process-based plugin architectures

The Provider Plugin System: gRPC, Protocols, and Provider Lifecycle

Terraform's provider ecosystem — over 4,000 providers covering every major cloud and service — is enabled by a remarkably clean plugin architecture. Each provider runs as a separate OS process communicating with Terraform Core over gRPC. This process isolation means a buggy provider can't crash Terraform, providers can be written in any language that supports gRPC, and the ecosystem can evolve independently of the core binary.

This article traces the full provider lifecycle: from discovery and download, through process launch and gRPC connection, to the three-layer abstraction that translates between Go types and the protobuf wire format.

providers.Interface: The Provider Contract

At the center of Terraform's provider abstraction is providers.Interface at internal/providers/provider.go#L17-L119:

type Interface interface {
    GetProviderSchema() GetProviderSchemaResponse
    ValidateProviderConfig(ValidateProviderConfigRequest) ValidateProviderConfigResponse
    ValidateResourceConfig(ValidateResourceConfigRequest) ValidateResourceConfigResponse
    ConfigureProvider(ConfigureProviderRequest) ConfigureProviderResponse
    ReadResource(ReadResourceRequest) ReadResourceResponse
    PlanResourceChange(PlanResourceChangeRequest) PlanResourceChangeResponse
    ApplyResourceChange(ApplyResourceChangeRequest) ApplyResourceChangeResponse
    ImportResourceState(ImportResourceStateRequest) ImportResourceStateResponse
    ReadDataSource(ReadDataSourceRequest) ReadDataSourceResponse
    Stop() error
    // ... ~25 methods total
}
classDiagram
    class Interface {
        <<interface>>
        +GetProviderSchema()
        +ValidateProviderConfig()
        +ValidateResourceConfig()
        +ConfigureProvider()
        +ReadResource()
        +PlanResourceChange()
        +ApplyResourceChange()
        +ImportResourceState()
        +ReadDataSource()
        +OpenEphemeralResource()
        +CallFunction()
        +ListResource()
        +Stop()
    }
    class GRPCProvider {
        -client proto.ProviderClient
        -ctx context.Context
        -schema GetProviderSchemaResponse
    }
    class GRPCProvider6 {
        -client proto6.ProviderClient
    }
    Interface <|.. GRPCProvider : protocol v5
    Interface <|.. GRPCProvider6 : protocol v6

This interface is what Terraform Core programs against — the graph nodes call provider.PlanResourceChange() and provider.ApplyResourceChange() as we saw in Article 3. The core engine has no knowledge of whether the provider is running in-process, across the network, or in a test harness.

The interface has grown organically over the years. Newer methods like OpenEphemeralResource(), CallFunction(), and ListResource() reflect recent Terraform features. The request/response pattern (each method takes a typed request struct and returns a typed response struct) makes the contract explicit and version-trackable.

Provider Discovery: Registry, Mirrors, and Dev Overrides

Before a provider can be used, Terraform must find and download it. The discovery system is built around the Source interface at internal/getproviders/source.go#L14-L18:

type Source interface {
    AvailableVersions(ctx context.Context, provider addrs.Provider) (VersionList, Warnings, error)
    PackageMeta(ctx context.Context, provider addrs.Provider, version Version, target Platform) (PackageMeta, error)
    ForDisplay(provider addrs.Provider) string
}

The providerSource() function in provider_source.go#L26-L39 assembles the source chain at boot time:

flowchart TD
    PS["providerSource()"] --> Check{"Explicit config?"}
    Check -->|No| Implicit["implicitProviderSource()"]
    Check -->|Yes| Explicit["explicitProviderSource()"]
    Implicit --> Registry["RegistrySource<br/>registry.terraform.io"]
    Implicit --> LocalMirror["FilesystemMirrorSource<br/>~/.terraform.d/plugins"]
    Explicit --> Multi["MultiSource"]
    Multi --> R2["RegistrySource"]
    Multi --> FM["FilesystemMirrorSource"]
    Multi --> NM["NetworkMirrorSource"]

When no explicit provider_installation block exists in the CLI config, Terraform creates an implicit source that checks the local filesystem cache first, then falls back to the public registry. When explicit configuration exists, it builds a MultiSource with include/exclude rules that can direct certain providers to specific mirrors.

Provider dev overrides (dev_overrides in CLI config) bypass this entire chain. When a developer is building a provider locally, they set a dev override that points directly to a local binary path. The reattach mechanism (TF_REATTACH_PROVIDERS environment variable) goes even further — it skips launching a process entirely and attaches to an already-running provider, which is how the SDK's acceptance testing framework works.

The gRPC Bridge: GRPCProvider and Type Conversion

The GRPCProvider at internal/plugin/grpc_provider.go#L53-L78 implements providers.Interface by translating each method call into gRPC requests:

type GRPCProvider struct {
    PluginClient *plugin.Client
    TestServer   *grpc.Server
    Addr         addrs.Provider
    client       proto.ProviderClient
    ctx          context.Context
    mu           sync.Mutex
    schema       providers.GetProviderSchemaResponse
}

The GRPCProviderPlugin at lines 32-47 integrates with HashiCorp's go-plugin framework:

type GRPCProviderPlugin struct {
    plugin.Plugin
    GRPCProvider func() proto.ProviderServer
}

func (p *GRPCProviderPlugin) GRPCClient(ctx context.Context, broker *plugin.GRPCBroker, c *grpc.ClientConn) (interface{}, error) {
    return &GRPCProvider{
        client: proto.NewProviderClient(c),
        ctx:    ctx,
    }, nil
}
sequenceDiagram
    participant Core as terraform.Context
    participant Iface as providers.Interface
    participant GRPC as GRPCProvider
    participant Convert as plugin/convert
    participant Proto as tfplugin5.ProviderClient
    participant Process as Provider Process

    Core->>Iface: PlanResourceChange(req)
    Iface->>GRPC: PlanResourceChange(req)
    GRPC->>Convert: cty.Value → DynamicValue (msgpack)
    Convert-->>GRPC: proto request
    GRPC->>Proto: PlanResourceChange(protoReq)
    Proto->>Process: gRPC call
    Process-->>Proto: gRPC response
    Proto-->>GRPC: proto response
    GRPC->>Convert: DynamicValue → cty.Value
    Convert-->>GRPC: Go types
    GRPC-->>Core: PlanResourceChangeResponse

The convert package under internal/plugin/convert/ handles the critical translation between Terraform's type system (cty.Value) and the protobuf wire format (DynamicValue). Values are serialized using MessagePack (not JSON) for efficiency. The schema is used to decode the msgpack bytes back into properly-typed cty.Value objects on the receiving end.

Tip: If you see mysterious type errors during plan or apply, the convert layer is often the culprit. The schema must match exactly between Terraform's understanding and the provider's understanding, or deserialization will produce incorrect values. The UpgradeResourceState RPC exists precisely to handle schema version mismatches.

Protocol v5 vs v6 and the Protobuf Definitions

Terraform currently supports two coexisting provider protocols:

  • Protocol v5 (internal/tfplugin5/) — the established protocol used by the majority of existing providers
  • Protocol v6 (internal/tfplugin6/) — the newer protocol that adds several capabilities

The protobuf service definitions live in docs/plugin-protocol/tfplugin5.proto and docs/plugin-protocol/tfplugin6.proto.

Key v6 additions include:

  • Resource identity — providers can declare identity schemas for resources, enabling Terraform to track identity across moves
  • Move resource state — supports changing resource types while preserving state
  • Deferred actions — providers can signal that a resource can't be planned yet
  • Ephemeral resources — resources that exist only for the duration of an operation
  • Functions — provider-contributed functions callable from HCL expressions
  • List resources — query-style resources that enumerate remote objects
classDiagram
    class ProtocolV5 {
        +GetSchema
        +ValidateResourceTypeConfig
        +ValidateDataSourceConfig
        +UpgradeResourceState
        +ConfigureProvider
        +ReadResource
        +PlanResourceChange
        +ApplyResourceChange
        +ImportResourceState
        +ReadDataSource
        +Stop
    }
    class ProtocolV6 {
        +GetProviderSchema
        +ValidateResourceConfig
        +ValidateDataResourceConfig
        +ValidateEphemeralResourceConfig
        +UpgradeResourceState
        +UpgradeResourceIdentity
        +ConfigureProvider
        +ReadResource
        +PlanResourceChange
        +ApplyResourceChange
        +ImportResourceState
        +MoveResourceState
        +ReadDataSource
        +OpenEphemeralResource
        +RenewEphemeralResource
        +CloseEphemeralResource
        +CallFunction
        +ListResource
        +Stop
    }
    ProtocolV5 <|-- ProtocolV6 : extends

Terraform detects the protocol version during the go-plugin handshake. Each protocol version has its own GRPCProvider implementation (GRPCProvider for v5, GRPCProvider6 for v6), both implementing the same providers.Interface. This means the core engine is completely agnostic to which protocol version a provider uses.

Server-Side Wrapping and Test Harnesses

The grpcwrap package at internal/grpcwrap/provider.go does the inverse of GRPCProvider — it wraps a providers.Interface into a gRPC server:

flowchart LR
    subgraph "Normal Operation"
        Core1["Terraform Core"] -->|"gRPC client"| Plugin["Provider Process<br/>(gRPC server)"]
    end
    subgraph "Testing / Reattach"
        Core2["Terraform Core"] -->|"providers.Interface"| Wrap["grpcwrap.Provider"]
        Wrap -->|"gRPC server impl"| InProc["In-process gRPC"]
    end

This is primarily used for:

  1. Integration testing — tests can create an in-process provider without launching a subprocess
  2. Provider reattach — the TF_REATTACH_PROVIDERS mechanism connects to an already-running provider process, useful for SDK acceptance tests
  3. Built-in providers — the terraform built-in provider runs in-process

Provider Addressing and Schema Loading

Every provider is identified by a fully-qualified name with three parts: hostname, namespace, and type. The addrs.Provider type at internal/addrs/provider.go#L13-L16 is an alias for tfaddr.Provider:

type Provider = tfaddr.Provider

For example, registry.terraform.io/hashicorp/aws has hostname registry.terraform.io, namespace hashicorp, and type aws. When a user writes just aws in their configuration, Terraform infers the full address via ImpliedProviderForUnqualifiedType().

Schema loading happens through contextPlugins, which caches schemas per provider. When a graph node needs a schema (which is constantly — for evaluating configs, planning changes, encoding state), it first checks the cache, and only calls GetProviderSchema() on the provider process if the schema isn't already loaded. This is critical for performance since schema fetching requires a gRPC round-trip to a separate process.

The PreloadedProviderSchemas field in ContextOpts allows callers to pre-populate the cache, avoiding redundant schema fetches when the schemas were already loaded for another purpose (like validation before planning).

What's Ahead

With the provider system demystified, Article 5 shifts focus to the other side of persistence: state management and backends. We'll explore the three-tier state architecture, the backend interface hierarchy, and how terraform init orchestrates backend configuration and state migration.