Read OSS

Conformance Testing and CI: Keeping 10+ Languages in Sync

Intermediate

Prerequisites

  • Article 1: Architecture and Navigation Guide (overall repo structure)

Conformance Testing and CI: Keeping 10+ Languages in Sync

A serialization format that promises cross-language compatibility is only as good as its test suite. Protobuf supports over 10 language implementations, each with different runtime architectures — from C++ with reflection and arenas (Articles 3-4) to upb-backed dynamic languages (Article 5) to Rust's dual-kernel system (Article 6). How do you ensure they all agree on the meaning of every bit on the wire?

The answer is the conformance test framework: a purpose-built system that validates every language implementation against a canonical specification across binary, JSON, and text format wire representations. In this article, we'll examine the test architecture, the subprocess communication protocol, the failure list system that tracks known divergences, and the CI infrastructure that runs it all.

Conformance Test Architecture

The conformance framework lives in the conformance/ directory and is built around two core abstractions. ConformanceTestSuite defines the test cases, and ConformanceTestRunner executes them against a specific language implementation.

The test suite is designed for extensibility. The header shows the intended usage pattern:

class MyConformanceTestSuite : public ConformanceTestSuite {
 public:
    void RunSuiteImpl() {
        // INSERT ACTUAL TESTS.
    }
};

Tests cover three wire format categories:

Category Description
BINARY_TEST Round-trip through protobuf binary wire format
JSON_TEST Round-trip through JSON wire format
TEXT_FORMAT_TEST Round-trip through text format

Each test case specifies an input payload in one format, an expected output in another format (or the same), and whether the operation should succeed or fail. This cross-format testing is critical because JSON has different semantics than binary (e.g., field names vs. field numbers, different handling of defaults).

classDiagram
    class ConformanceTestSuite {
        +RunSuiteImpl()*
        +RunTest(request, response)
        -failure_list_: FailureListTrieNode
    }
    
    class ConformanceTestRunner {
        <<interface>>
        +RunTest(request, response)*
    }
    
    class ForkPipeRunner {
        "Subprocess communication"
        +RunTest(request, response)
    }
    
    class InProcessRunner {
        "Direct function call"
        +RunTest(request, response)
    }
    
    ConformanceTestSuite --> ConformanceTestRunner
    ConformanceTestRunner <|-- ForkPipeRunner
    ConformanceTestRunner <|-- InProcessRunner

The suite uses a FailureListTrieNode for efficient matching of known failures, supporting wildcard patterns like Recommended.*.JsonInput.BoolFieldDoubleQuotedFalse.

The Subprocess Test Protocol

The conformance protocol, defined in conformance.proto, uses a simple request/response pattern. Each language implements a "testee" program that reads ConformanceRequest messages and produces ConformanceResponse messages.

The request specifies:

  • A payload in one of several formats (protobuf binary, JSON, JSPB, text format)
  • The requested output format
  • The message type to use for parsing
  • The test category
sequenceDiagram
    participant Runner as Test Runner (C++)
    participant Testee as Language Testee<br/>(e.g., Python)
    
    Runner->>Testee: Fork + pipe
    
    loop For each test case
        Runner->>Testee: [4-byte length] + ConformanceRequest
        Note over Testee: Parse input payload<br/>Re-serialize to output format
        Testee->>Runner: [4-byte length] + ConformanceResponse
        Note over Runner: Compare response<br/>against expected result
    end
    
    Runner->>Testee: Close stdin (EOF)

The communication uses a simple length-delimited protocol: a 4-byte little-endian length prefix followed by the serialized protobuf message. This is deliberately simple — any language that can read stdin and write stdout can implement a conformance testee.

The ConformanceResponse can indicate:

  • Success: Contains the serialized output in the requested format
  • Parse error: The input was invalid (expected for negative test cases)
  • Serialize error: Parsing succeeded but serialization failed
  • Runtime error: An unexpected error occurred
  • Skipped: The testee doesn't support this test category

The first request the runner sends is special: it has message_type = "conformance.FailureSet", and the testee responds with its list of known-failing tests. This allows the runner to distinguish between expected failures and regressions.

Failure Lists and Known Non-Conformances

Each language maintains a failure list file that documents tests known to fail. The conformance/failure_list_cpp.txt file for C++ shows the pattern:

# This is the list of conformance tests that are known to fail for the C++
# implementation right now.  These should be fixed.

Recommended.*.JsonInput.BoolFieldDoubleQuotedFalse    # Should have failed to parse
Recommended.*.JsonInput.FieldNameDuplicate             # Should have failed to parse
Recommended.*.JsonInput.StringFieldSingleQuoteBoth     # Should have failed to parse

Each line is a test name pattern (supporting * wildcards) followed by a comment explaining the failure. The naming convention Recommended.*.JsonInput.BoolFieldDoubleQuotedFalse tells you this is a "Recommended" (not required) test, for any message type, testing JSON input with double-quoted boolean false values.

flowchart TD
    A["Conformance Test Runs"] --> B{Test passes?}
    B -->|Yes| C{Was it in<br/>failure list?}
    B -->|No| D{Was it in<br/>failure list?}
    C -->|Yes| E["⚠️ Unexpected pass!<br/>Remove from failure list"]
    C -->|No| F["✅ Pass"]
    D -->|Yes| G["✅ Expected failure"]
    D -->|No| H["❌ Regression!<br/>CI fails"]

The workflow is:

  1. A new test is added to the conformance suite
  2. If a language fails it, the test name is added to that language's failure list
  3. CI passes because the failure is expected
  4. When the language implementation is fixed, the test name is removed from the failure list
  5. If a test starts failing that isn't in the failure list, CI fails — it's a regression

This system lets the protobuf team add aspirational tests (like strict JSON parsing) before all implementations support them, without blocking CI. It also provides a clear picture of each language's conformance status.

Tip: If you're implementing a protobuf library in a new language, start by implementing the conformance testee. The test suite will immediately tell you which wire format behaviors you're getting wrong. The failure list pattern lets you incrementally approach full conformance.

CI/CD Infrastructure

The CI infrastructure is organized as per-language GitHub Actions workflows, orchestrated by test_runner.yml.

The test runner fires on:

  • Pushes to main: Post-submit validation
  • Pull requests: Pre-submit validation
  • Hourly schedule: Catching flakes and environment changes
  • Manual dispatch: For debugging

Each language has its own workflow file:

Workflow File
C++ test_cpp.yml
Java test_java.yml
Python test_python.yml
Ruby test_ruby.yml
PHP test_php.yml
C# test_csharp.yml
Objective-C test_objectivec.yml
Rust test_rust.yml
HPB test_hpb.yml
upb test_upb.yml
Bazel test_bazel.yml
flowchart TD
    TRIGGER["Push / PR / Schedule"] --> RUNNER["test_runner.yml"]
    RUNNER --> SAFE{"Safe source?<br/>(internal branch)"}
    SAFE -->|Yes| JOBS["Spawn per-language jobs"]
    SAFE -->|No| LABEL{"'safe for tests'<br/>label?"}
    LABEL -->|Yes| JOBS
    LABEL -->|No| SKIP["Skip tests"]
    
    JOBS --> CPP["test_cpp.yml"]
    JOBS --> JAVA["test_java.yml"]
    JOBS --> PY["test_python.yml"]
    JOBS --> RUST["test_rust.yml"]
    JOBS --> MORE["...other languages"]
    
    CPP --> BAZEL["Bazel test //..."]
    JAVA --> BAZEL
    PY --> BAZEL

The test runner implements a protection strategy for forked pull requests. PRs from within the repository run tests immediately. PRs from forks require a "safe for tests" label to prevent PWN requests and stolen compute resources — the label is immediately removed after being consumed so each commit requires fresh approval.

Concurrency control prevents duplicate runs:

concurrency:
  group: ${{ github.event_name }}-${{ github.workflow }}-${{ github.head_ref || github.ref }}
  cancel-in-progress: true

This means a new push to a PR branch cancels any in-progress test runs for the previous commit.

The Full Picture

The conformance testing and CI system is what makes protobuf's ambitious multi-language promise credible. Without it, subtle divergences would accumulate — a JSON parser that accepts slightly non-standard input here, a binary encoder that handles edge cases differently there — until messages that work in one language silently fail or corrupt data in another.

The combination of:

  • A formal protocol (conformance.proto) defining exact test semantics
  • Per-language failure lists documenting known gaps
  • Automated CI that catches regressions immediately
  • Cross-format testing (binary, JSON, text)

...ensures that when you serialize a protobuf message in Python and deserialize it in Rust (or Java, or C++, or PHP), you get the same result. Across this entire series, from the compiler pipeline (Article 2) through the descriptor system (Article 3), the performance stack (Article 4), the upb runtime (Article 5), and the code generators (Article 6), the conformance test suite is the final arbiter of correctness.

That's the protobuf monorepo — a remarkable engineering artifact that coordinates a compiler, two runtimes, 10+ language implementations, and a comprehensive test framework within a single repository. Understanding its architecture gives you not just knowledge of protobuf's internals, but a case study in how to build and maintain a large-scale multi-language infrastructure project.