Read OSS

Hybrid Mode: Control Planes, Data Planes, and Configuration Sync

Advanced

Prerequisites

  • Articles 1-5 (full architecture through database layer)
  • Understanding of WebSocket protocol basics
  • Familiarity with distributed systems concepts (eventual consistency)

Hybrid Mode: Control Planes, Data Planes, and Configuration Sync

Production Kong deployments rarely use a single node. The hybrid mode architecture separates concerns: Control Planes (CP) manage configuration via the Admin API and database, while Data Planes (DP) proxy traffic using configuration received from the CP. This separation means Data Planes need no database connection, can run in isolated network segments, and can be scaled independently.

This article traces the CP/DP communication pipeline — from role detection during initialization through WebSocket-based config push and the newer incremental sync system.

Hybrid Mode Architecture and Role Detection

As we saw in Part 2, role detection happens early in Kong.init() via simple configuration checks at lines 201–218:

is_data_plane = function(config) return config.role == "data_plane" end
is_control_plane = function(config) return config.role == "control_plane" end

The role setting (configured via KONG_ROLE environment variable or kong.conf) shapes the entire initialization path:

  • Control Plane: Connects to Postgres, runs the Admin API, skips router building (it doesn't proxy traffic), starts WebSocket server for DPs
  • Data Plane: Runs in DB-less mode (LMDB), skips Admin API, starts WebSocket client to CP, proxies traffic
  • Traditional: Connects to Postgres, runs everything (Admin API + proxy)

The clustering module is initialized at lines 701–713 in Kong.init():

if is_http_module and (is_data_plane(config) or is_control_plane(config)) then
  kong.clustering = require("kong.clustering").new(config)
  if config.cluster_rpc then
    kong.rpc = require("kong.clustering.rpc.manager").new(config, kong.node.get_id())
    if config.cluster_rpc_sync then
      kong.sync = require("kong.clustering.services.sync").new(db, is_control_plane(config))
    end
  end
end
flowchart TD
    subgraph "Control Plane"
        A[Admin API] --> B[(PostgreSQL)]
        B --> C[Config Export]
        C --> D[WebSocket Server]
    end
    subgraph "Data Plane 1"
        E[WebSocket Client] --> F[Declarative Config Loader]
        F --> G[(LMDB)]
        G --> H[Router + Plugins]
        H --> I[Proxy Traffic]
    end
    subgraph "Data Plane 2"
        J[WebSocket Client] --> K[Declarative Config Loader]
        K --> L[(LMDB)]
        L --> M[Router + Plugins]
        M --> N[Proxy Traffic]
    end
    D <-->|mTLS| E
    D <-->|mTLS| J

Control Plane: Config Broadcast via WebSocket

The Control Plane module is instantiated in kong/clustering/init.lua at line 80:

function _M:init_cp_worker(basic_info)
  events.init()
  self.instance = require("kong.clustering.control_plane").new(self)
  self.instance:init_worker(basic_info)
end

The kong/clustering/control_plane.lua module manages the WebSocket server that accepts DP connections. The flow:

  1. A DP connects via WebSocket to the CP's cluster listener
  2. The CP validates the DP's client certificate via mTLS (validate_client_cert at lines 57–62)
  3. The CP checks plugin/filter compatibility between CP and DP versions
  4. The CP exports the current configuration and sends it as a compressed payload
  5. When configuration changes (via Admin API or migrations), the CP pushes updated config to all connected DPs

The export process serializes all entities from the database into a declarative config format, applies compatibility transformations for older DP versions, and deflates the payload with gzip. The function at line 67:

local function handle_export_deflated_reconfigure_payload(self)
  local ok, p_err, err = pcall(self.export_deflated_reconfigure_payload, self)
  return ok, p_err or err
end

The CP maintains a ping/pong heartbeat with each DP (every 30 seconds, per CLUSTERING_PING_INTERVAL in constants.lua). If a DP misses heartbeats, it's marked as offline in the clustering_data_planes entity, visible via GET /clustering/data-planes on the Admin API.

sequenceDiagram
    participant DP as Data Plane
    participant CP as Control Plane
    participant DB as PostgreSQL

    DP->>CP: WebSocket connect + mTLS cert
    CP->>CP: validate_client_cert()
    CP->>CP: Check plugin compatibility
    CP->>DB: Export all entities
    CP->>CP: Serialize + gzip compress
    CP->>DP: RECONFIGURE payload
    DP->>DP: Apply declarative config
    DP->>CP: PONG (heartbeat)
    
    Note over CP: Admin API changes config
    CP->>DB: Write changes
    CP->>CP: Re-export config
    CP->>DP: RECONFIGURE (updated)
    DP->>DP: Rebuild router + plugins

Data Plane: Receiving and Applying Configuration

The Data Plane side lives in kong/clustering/data_plane.lua. The DP creates a WebSocket client that connects to the CP using the cluster certificate:

function _M.new(clustering)
  local self = {
    declarative_config = kong.db.declarative_config,
    conf = clustering.conf,
    cert = clustering.cert,
    cert_key = clustering.cert_key,
  }
  return setmetatable(self, _MT)
end

When the DP receives a RECONFIGURE message, it:

  1. Decompresses the gzip payload via inflate_gzip
  2. Parses the JSON into Lua tables
  3. Validates against the declarative config schema
  4. Loads into LMDB via the declarative config pipeline (the same pipeline used for file-based DB-less config, as covered in Part 5)
  5. Rebuilds the router and plugins iterator

The reconfigure handler in kong/runloop/handler.lua logs the timing:

local reconfigure_time = get_monotonic_ms() - reconfigure_started_at
if ok then
  log(INFO, "declarative reconfigure took ", reconfigure_time,
            " ms on worker #", worker_id)
end

Each worker independently processes the reconfiguration event. The events.register_events(reconfigure_handler) call at line 949 registers the handler for worker events, so when one worker receives the config from the CP via WebSocket, it posts an event that triggers all workers to rebuild.

Tip: The DP stores its configuration hash and compares it with incoming payloads. If the hash matches, the reconfiguration is skipped — avoiding unnecessary router rebuilds when the CP pushes identical config (e.g., after a CP restart).

The RPC Framework and Incremental Sync

The traditional CP→DP sync has a limitation: every config change sends the entire configuration. For deployments with thousands of routes, this can mean multi-megabyte payloads on every change.

Kong's newer RPC framework in kong/clustering/rpc/manager.lua provides bidirectional communication between CP and DP using JSON-RPC v2 over WebSocket. The RPC manager maintains client connections and capability negotiation:

function _M.new(conf, node_id)
  local self = {
    clients = {},
    client_capabilities = {},
    node_id = node_id,
    conf = conf,
    cluster_cert = assert(clustering_tls.get_cluster_cert(conf)),
    cluster_cert_key = assert(clustering_tls.get_cluster_cert_key(conf)),
    callbacks = callbacks.new(),
  }

Built on the RPC framework, the incremental sync system in kong/clustering/services/sync/init.lua enables delta-based configuration updates:

function _M.new(db, is_cp)
  local strategy = strategy.new(db)
  local self = {
    db = db,
    strategy = strategy,
    rpc = rpc.new(strategy),
    is_cp = is_cp,
  }
  if is_cp then
    self.hooks = require("kong.clustering.services.sync.hooks").new(strategy)
  end
  return setmetatable(self, _MT)
end

The sync system uses DAO hooks on the CP side to track changes. When an entity is created, updated, or deleted, the hook records a delta. The DP periodically calls kong.sync.v2 RPC to fetch only the changes since its last known version.

The DP side at lines 41–80 registers for RPC readiness events and starts syncing:

worker_events.register(function(capabilities_list)
  for _, v in ipairs(capabilities_list) do
    if v == "kong.sync.v2" then
      has_sync_v2 = true
      break
    end
  end
end, "clustering:jsonrpc", "connected")
flowchart LR
    subgraph "Full Sync (v1)"
        A[CP] -->|Entire config payload| B[DP]
    end
    subgraph "Incremental Sync (v2/RPC)"
        C[CP] -->|Delta: +route, -plugin, ~service| D[DP]
        D -->|"kong.sync.v2 RPC: last_version=42"| C
    end

The incremental sync is controlled by the cluster_rpc and cluster_rpc_sync configuration options. When both are enabled, Kong uses the RPC framework for sync instead of the traditional full-config push.

Security: mTLS Between Planes

All CP/DP communication is secured with mutual TLS. The cluster_cert and cluster_cert_key configuration options specify the certificate used for both client and server authentication. The CP validates the DP's certificate against its own CA, and vice versa.

The validation happens in kong/clustering/init.lua:

function _M:validate_client_cert(cert_pem)
  cert_pem = cert_pem or ngx_var.ssl_client_raw_cert
  return validate_client_cert(self.conf, self.cert, cert_pem)
end

This mutual authentication ensures that only authorized Data Planes can receive configuration from the Control Plane — critical since the configuration may contain sensitive data like API keys and upstream credentials.

In Part 7, we'll explore Kong's newest major subsystem: the AI gateway capabilities that enable proxying and transforming LLM requests across multiple providers.