Dynamic Configuration: The Orchestrator, Feature Flags, and Runtime Updates

Most reverse proxies require a restart to pick up configuration changes. Cloudflared doesn't. When the Cloudflare edge pushes a new configuration — new ingress rules, changed origin settings, updated flow limits — cloudflared applies it without dropping a single in-flight request. The mechanism behind this is the Orchestrator's copy-on-write proxy pattern, one of the most elegant design decisions in the codebase.

This article dissects how runtime configuration updates work, from the four-layer configuration hierarchy through the atomic proxy swap, and then explores the DNS-based feature flag system that controls gradual rollouts of new capabilities like datagram v3.

The Configuration Hierarchy

Cloudflared merges configuration from four sources, with a strict priority ordering:

flowchart TD
    CLI[CLI Flags<br/>Highest Priority] --> Env[Environment Variables<br/>TUNNEL_* prefix]
    Env --> File[Config File<br/>YAML in search paths]
    File --> Remote[Remote Edge Config<br/>Lowest Priority]
    
    CLI --> Merge[prepareTunnelConfig merges all layers]
    Env --> Merge
    File --> Merge
    Remote --> Merge
    
    Merge --> TC[supervisor.TunnelConfig]
    Merge --> OC[orchestration.Config]
    
    style CLI fill:#dc2626,color:#fff
    style Remote fill:#059669,color:#fff

CLI flags have the highest priority. The prepareTunnelConfig function reads them via urfave/cli's c.String(), c.Int(), etc.

Environment variables are handled through urfave/cli's altsrc package, which maps TUNNEL_* prefixed env vars to their corresponding flags. This happens transparently during flag parsing.

Config files are searched in DefaultConfigSearchDirectories: ~/.cloudflared, ~/.cloudflare-warp, ~/cloudflare-warp, /etc/cloudflared, and /usr/local/etc/cloudflared. The search accepts both config.yml and config.yaml.

Remote edge configuration has the lowest priority — it can be overridden by any local setting. This is implemented in the Orchestrator's overrideRemoteWarpRoutingWithLocalValues method, which checks local configuration flags and overwrites remote values when local overrides exist.

Tip: When troubleshooting config issues, remember that a --token-based tunnel with no local config file will get its ingress rules entirely from the Cloudflare dashboard. Add a config.yml with the --config flag to override specific settings while keeping the remote management benefits.

Orchestrator: The Copy-on-Write Proxy Pattern

The Orchestrator is cloudflared's configuration coordinator. Its most important field is proxy atomic.Value, which stores the current *proxy.Proxy instance:

type Orchestrator struct {
    currentVersion int32
    lock           sync.RWMutex
    proxy          atomic.Value  // Stores *proxy.Proxy
    internalRules  []ingress.Rule
    config         *Config
    flowLimiter    cfdflow.Limiter
    // ...
}

The split between atomic.Value for the proxy and sync.RWMutex for everything else is deliberate. The proxy is read on every single request via GetOriginProxy():

func (o *Orchestrator) GetOriginProxy() (connection.OriginProxy, error) {
    val := o.proxy.Load()
    // ... type assertion ...
    return proxy, nil
}

This is a completely lock-free read — no mutex, no contention, no blocking. The mutex is only held during configuration updates, which happen orders of magnitude less frequently than request processing.

sequenceDiagram
    participant R1 as Request Goroutine 1
    participant R2 as Request Goroutine 2
    participant Orch as Orchestrator
    participant Edge as Edge Config Push

    R1->>Orch: GetOriginProxy() [atomic.Load]
    Note over R1,Orch: Lock-free read!
    R2->>Orch: GetOriginProxy() [atomic.Load]
    
    Edge->>Orch: UpdateConfig(version, config)
    Note over Orch: lock.Lock()
    Orch->>Orch: Build new proxy
    Orch->>Orch: proxy.Store(newProxy) [atomic]
    Note over Orch: lock.Unlock()
    
    R1->>Orch: GetOriginProxy() [returns new proxy]

Compare this to an RWMutex approach where every request would need to acquire a read lock. Under high concurrency, even read locks create contention from cache-line bouncing. The atomic.Value pattern eliminates this entirely.

Versioned Config Updates from the Edge

The UpdateConfig method handles configuration updates pushed from the Cloudflare edge:

func (o *Orchestrator) UpdateConfig(version int32, config []byte) *pogs.UpdateConfigurationResponse {
    o.lock.Lock()
    defer o.lock.Unlock()

    if o.currentVersion >= version {
        return &pogs.UpdateConfigurationResponse{
            LastAppliedVersion: o.currentVersion,
        }
    }
    // ... deserialize and apply ...
}

The version check is critical. Because cloudflared maintains up to four connections to the edge, multiple connections might push the same configuration update concurrently. The version guard ensures idempotency — only the first delivery of a new version triggers an actual update.

The initial version is -1, which means the very first remote configuration (version 0) will always be accepted. This enables the migration path from local-only to remote-managed tunnels.

Zero-Downtime Ingress Updates

The updateIngress method is where the magic happens. The key insight is in the ordering of operations:

sequenceDiagram
    participant Orch as Orchestrator
    participant NewOrigins as New Origins
    participant OldProxy as Old Proxy
    participant NewProxy as New Proxy

    Orch->>Orch: Create proxyShutdownC channel
    Orch->>NewOrigins: StartOrigins(log, proxyShutdownC)
    Note over NewOrigins: New origins are running!
    
    Orch->>Orch: flowLimiter.SetLimit(new limit)
    Orch->>Orch: originDialerService.UpdateDefaultDialer(new settings)
    
    Orch->>NewProxy: proxy.NewOriginProxy(newRules, ...)
    Orch->>Orch: proxy.Store(newProxy) ← atomic swap
    Note over Orch: New requests go to new proxy
    Note over OldProxy: In-flight requests still completing on old proxy
    
    Orch->>Orch: close(old proxyShutdownC)
    Note over OldProxy: Old origins begin shutdown

Start new origins before stopping old ones. The comment in the source code explains the tradeoff:

// Start new proxy before closing the ones from last version.
// The upside is we don't need to restart proxy from last version, which can fail
// The downside is new version might have ingress rule that require previous version 
// to be shutdown first
// The downside is minimized because none of the ingress.OriginService 
// implementations have that requirement

This creates a brief window where both old and new origins are running. New requests immediately go to the new proxy (via the atomic store), while in-flight requests on the old proxy complete naturally. When the old proxyShutdownC is closed, the old origins begin their shutdown sequence.

The waitToCloseLastProxy goroutine (spawned at construction time) ensures the final proxy's origins are cleaned up when cloudflared shuts down.

Feature Flags via DNS TXT Records

Cloudflared's featureSelector implements a DNS-based feature flag system. It queries a TXT record at cfd-features.argotunnel.com to get a JSON payload with rollout percentages:

type featuresRecord struct {
    DatagramV3Percentage uint32 `json:"dv3_2"`
}

flowchart TD
    Start[featureSelector created] --> Hash[FNV32a hash of accountTag]
    Hash --> Mod[hash % 100 = accountHash]
    Mod --> DNS[DNS TXT lookup: cfd-features.argotunnel.com]
    DNS --> Parse[Parse JSON: dv3_2 percentage]
    Parse --> Compare{percentage > accountHash?}
    Compare -->|Yes| Enable[Feature enabled for this account]
    Compare -->|No| Disable[Feature disabled]
    
    Refresh[Hourly refresh loop] -->|Every hour| DNS

The switchThreshold function creates a deterministic bucket for each account:

func switchThreshold(accountTag string) uint32 {
    h := fnv.New32a()
    _, _ = h.Write([]byte(accountTag))
    return h.Sum32() % 100
}

This means an account always lands in the same bucket (0-99). Setting dv3_2 to 50 in the DNS record enables datagram v3 for roughly 50% of accounts — specifically, those whose FNV32 hash mod 100 is less than 50.

The elegance of this approach is that Cloudflare can control rollouts globally without shipping new client code. To enable datagram v3 for 10% of accounts, they set the TXT record to {"dv3_2": 10}. To do a full rollout, set it to 100. To emergency-rollback, set it to 0. All changes propagate within the DNS TTL (1 hour by default, with a 10-second lookup timeout).

The feature selector also supports CLI overrides — if a user passes --features datagram-v3-2 on the command line, it takes priority over the DNS-based evaluation.

Config File Hot-Reload in Service Mode

When cloudflared runs as a system service (no arguments), it operates in service mode using the handleServiceMode function. This mode creates a three-component pipeline for config file watching:

sequenceDiagram
    participant FS as File System
    participant Watcher as watcher.File
    participant CM as config.FileManager
    participant OM as overwatch.AppManager
    participant App as AppService

    FS->>Watcher: Config file changed
    Watcher->>CM: File event
    CM->>CM: Re-read config
    CM->>OM: Configuration update
    OM->>App: Restart tunnel service
    Note over App: Old tunnel stops, new tunnel starts

f, err := watcher.NewFile()
configManager, err := config.NewFileManager(f, configPath, log)
serviceManager := overwatch.NewAppManager(serviceCallback)
appService := NewAppService(configManager, serviceManager, shutdownC, log)

Unlike the Orchestrator's zero-downtime swap (which preserves in-flight requests), service mode hot-reload is a full restart — it stops the old tunnel and starts a new one. This is appropriate for the service mode use case where the config file defines the tunnel identity itself, not just the routing rules.

Tip: For zero-downtime config changes in production, use Cloudflare's dashboard to manage your tunnel configuration remotely. The Orchestrator's atomic swap handles these updates without dropping connections. Service mode file-watching is more suited for development and testing scenarios.

What's Next

We've covered cloudflared's runtime adaptability — from the elegant copy-on-write proxy swap to DNS-based feature flags. In our final article, we'll survey the cross-cutting operational concerns: the Observer event system, structured logging with ConnAwareLogger, Prometheus metrics, the in-tunnel management service, runtime diagnostics, and the build system's FIPS/version handling.

Dynamic Configuration: The Orchestrator, Feature Flags, and Runtime Updates

Prerequisites

Dynamic Configuration: The Orchestrator, Feature Flags, and Runtime Updates

The Configuration Hierarchy

Orchestrator: The Copy-on-Write Proxy Pattern

Versioned Config Updates from the Edge

Zero-Downtime Ingress Updates

Feature Flags via DNS TXT Records

Config File Hot-Reload in Service Mode

What's Next