Read OSS

Testing 25 Integrations: The Test Suite and Contributor Guide

Intermediate

Prerequisites

  • Articles 1-5 in this series
  • Basic pytest knowledge
  • Familiarity with GitHub Actions CI/CD

Testing 25 Integrations: The Test Suite and Contributor Guide

A system that generates files for 25+ AI agents has a combinatorial testing problem. Each integration produces different output formats, file extensions, directory structures, and frontmatter. A single broken placeholder replacement could silently produce commands that confuse an AI agent. Spec Kit's test suite addresses this with a pattern-based approach: every integration test verifies the same invariants, adapted to each agent's output format. This article maps the test architecture and closes with a practical guide for the most common contribution — adding a new AI agent.

Test Architecture Overview

The test suite mirrors the source structure:

tests/
├── conftest.py                          # ANSI stripping helper
├── integrations/
│   ├── conftest.py                      # StubIntegration test helper
│   ├── test_base.py                     # Base class unit tests
│   ├── test_integration_claude.py       # Claude-specific tests
│   ├── test_integration_copilot.py      # Copilot-specific tests
│   ├── test_integration_windsurf.py     # ...one per agent (27 files)
│   ├── test_manifest.py                 # IntegrationManifest tests
│   └── test_registry.py                 # INTEGRATION_REGISTRY tests
├── test_extensions.py                   # Extension system tests
├── test_presets.py                      # Preset system tests
├── test_agent_config_consistency.py     # Cross-integration consistency
├── test_merge.py                        # JSON merge logic tests
└── test_branch_numbering.py             # Branch naming tests

There are 51 test files total. The largest category is per-integration tests under tests/integrations/ — one file per supported agent plus tests for the base classes, manifest, and registry.

The shared conftest.py provides a single utility:

_ANSI_ESCAPE_RE = re.compile(r"\x1b\[[0-?]*[ -/]*[@-~]")

def strip_ansi(text: str) -> str:
    """Remove ANSI escape codes from Rich-formatted CLI output."""
    return _ANSI_ESCAPE_RE.sub("", text)

This is essential because the CLI uses Rich for styled output. Any test that captures CLI stdout needs to strip ANSI codes before asserting on content.

The integration test conftest at tests/integrations/conftest.py provides a StubIntegration — a minimal MarkdownIntegration subclass used for testing base class behavior without depending on any real agent:

class StubIntegration(MarkdownIntegration):
    key = "stub"
    config = {
        "name": "Stub Agent",
        "folder": ".stub/",
        "commands_subdir": "commands",
        "install_url": None,
        "requires_cli": False,
    }

Integration Test Patterns

Every integration test file follows the same structure. Let's examine the Claude tests at tests/integrations/test_integration_claude.py:

Registration verification:

def test_registered(self):
    assert "claude" in INTEGRATION_REGISTRY
    assert get_integration("claude") is not None

Config validation:

def test_config_uses_skills(self):
    integration = get_integration("claude")
    assert integration.config["folder"] == ".claude/"
    assert integration.config["commands_subdir"] == "skills"

The critical placeholder test:

def test_setup_creates_skill_files(self, tmp_path):
    integration = get_integration("claude")
    manifest = IntegrationManifest("claude", tmp_path)
    created = integration.setup(tmp_path, manifest, script_type="sh")

    content = plan_skill.read_text(encoding="utf-8")
    assert "{SCRIPT}" not in content
    assert "{ARGS}" not in content
    assert "__AGENT__" not in content

This three-line assertion is the most important pattern in the test suite. It verifies that process_template() has replaced all placeholders. An unprocessed {SCRIPT} in a command file would cause the AI agent to see literal placeholder text instead of a shell command — silently breaking the workflow.

The Copilot tests at tests/integrations/test_integration_copilot.py add format-specific assertions:

def test_setup_creates_agent_md_files(self, tmp_path):
    # Verifies .agent.md extension
    for f in agent_files:
        assert f.name.endswith(".agent.md")

def test_setup_creates_companion_prompts(self, tmp_path):
    # Verifies matching .prompt.md files
    for f in prompt_files:
        content = f.read_text(encoding="utf-8")
        assert content.startswith("---\nagent: speckit.")

def test_agent_and_prompt_counts_match(self, tmp_path):
    # Critical: every .agent.md must have a matching .prompt.md
    assert len(agents) == len(prompts)

Tip: When adding a new integration test, start by copying test_integration_windsurf.py (the simplest case), then adapt assertions for your agent's specific format requirements. The placeholder checks ({SCRIPT}, {ARGS}, __AGENT__) should be in every integration test.

All integration tests use pytest's tmp_path fixture for isolated filesystem operations. This ensures tests don't interfere with each other and don't leave artifacts on the developer's machine.

Extension and Preset Testing

The extension tests at tests/test_extensions.py cover the full lifecycle:

flowchart TD
    A["Manifest Validation"] --> B["Registry Operations"]
    B --> C["Manager Install/Remove"]
    C --> D["Command Registration"]
    D --> E["Catalog Discovery"]
    E --> F["Hook Execution"]

Manifest validation tests verify both positive and negative cases — valid manifests parse correctly, and invalid ones raise ValidationError with specific messages. The tests use tempfile.mkdtemp() and shutil.rmtree() in fixtures rather than tmp_path, though both approaches work.

The preset tests at tests/test_presets.py mirror the extension test structure — manifest validation, registry operations, and template resolution. The resolver priority stack (local > presets > extensions > core) is tested by setting up multiple sources providing the same template name and verifying the correct one wins.

CI Pipeline and Release Process

The CI runs on every push to main and every pull request. The test.yml workflow has two jobs:

Ruff linting — runs uvx ruff check src/ on Python 3.13 only (linting doesn't need multiple versions).

pytest — runs the full test suite across Python 3.11, 3.12, and 3.13:

strategy:
  matrix:
    python-version: ["3.11", "3.12", "3.13"]
steps:
  - name: Install dependencies
    run: uv sync --extra test
  - name: Run tests
    run: uv run pytest
flowchart LR
    A["Push/PR"] --> B["ruff check<br/>(3.13 only)"]
    A --> C["pytest<br/>(3.11)"]
    A --> D["pytest<br/>(3.12)"]
    A --> E["pytest<br/>(3.13)"]
    B --> F{"All green?"}
    C --> F
    D --> F
    E --> F
    F -->|Yes| G["✓ Merge allowed"]

The release process is tag-triggered via release.yml. When a v* tag is pushed, the workflow extracts the version, generates release notes from the commit log since the previous tag, and creates a GitHub Release. The wheel is built by Hatch, which bundles all assets via the force-include configuration we covered in Part 1.

Note the tooling choice: uv is used exclusively for dependency management and script execution (uv sync, uv run). There's no pip install or requirements.txt anywhere in the CI — uv handles resolution and installation from pyproject.toml.

Contributing: Adding a New Integration

The most common contribution is adding support for a new AI coding assistant. The AGENTS.md file is the definitive guide, but here's the distilled version:

flowchart TD
    A["1. Choose base class"] --> B{"Agent format?"}
    B -->|"Standard .md"| C["MarkdownIntegration"]
    B -->|".toml"| D["TomlIntegration"]
    B -->|"skill dirs"| E["SkillsIntegration"]
    B -->|"Fully custom"| F["IntegrationBase"]
    C --> G["2. Create subpackage"]
    D --> G
    E --> G
    F --> G
    G --> H["3. Register in _register_builtins()"]
    H --> I["4. Add scripts/ dir"]
    I --> J["5. Write tests"]
    J --> K["6. Run test suite"]

Step 1: Choose a base class. Most agents use MarkdownIntegration. If the agent requires TOML (like Gemini), use TomlIntegration. If it uses a skill directory structure, use SkillsIntegration. Only use IntegrationBase directly if you need companion files or settings merges (like Copilot).

Step 2: Create the subpackage. Create src/specify_cli/integrations/myagent/__init__.py:

"""MyAgent integration."""
from ..base import MarkdownIntegration

class MyAgentIntegration(MarkdownIntegration):
    key = "myagent"
    config = {
        "name": "My Agent",
        "folder": ".myagent/",
        "commands_subdir": "commands",
        "install_url": "https://myagent.dev/install",
        "requires_cli": True,  # or False for IDE-only agents
    }
    registrar_config = {
        "dir": ".myagent/commands",
        "format": "markdown",
        "args": "$ARGUMENTS",
        "extension": ".md",
    }
    context_file = ".myagent/rules.md"

The key should match the actual CLI binary name so that tool detection works. The folder must include a trailing /. The registrar_config["dir"] is the path where extension commands will be written by CommandRegistrar.

Step 3: Register. Add to integrations/__init__.py alphabetically:

from .myagent import MyAgentIntegration
# ...
_register(MyAgentIntegration())

Step 4: Add scripts. Create src/specify_cli/integrations/myagent/scripts/ with update-context.sh and update-context.ps1. These are thin wrappers that update the agent's context file.

Step 5: Write tests. Create tests/integrations/test_integration_myagent.py with at minimum: registration check, config validation, setup creates files, and the placeholder replacement test.

Step 6: Run the suite. uv run pytest should pass. Then run uvx ruff check src/ for linting.

Tip: The CONTRIBUTING.md notes that large changes require prior discussion with maintainers. Adding a new integration is a well-understood contribution path and is generally welcome, but check the issue tracker for an existing agent_request issue first — someone may already be working on it.

Series Conclusion

Over these six articles, we've traced Spec Kit from its architectural foundations through the specify init pipeline, the 4-tier integration hierarchy, the command template workflow engine, the extension and preset plugin systems, and finally the test suite and contribution workflow. The key insight is that Spec Kit is two things at once: a Python CLI that generates files, and a declarative instruction set where markdown templates are programs and AI agents are the runtime. The CLI is the compiler; the templates are the code; the LLM is the CPU.

The codebase rewards close reading. The monolithic __init__.py is dense but navigable. The integration hierarchy is a textbook application of the Template Method pattern. The air-gapped bundling via Hatch's force-include is a pattern worth stealing for any command-line tool that ships runtime assets. And the hook system — where the AI reads YAML config and executes hooks at runtime — is a novel take on plugin architecture that's uniquely suited to the AI-assisted development era.