Read OSS

Anatomy of a Service Emulator: GitHub from Entities to Routes

Advanced

Prerequisites

  • Articles 1-3
  • Familiarity with the GitHub REST API

Anatomy of a Service Emulator: GitHub from Entities to Routes

We've covered the infrastructure — the plugin interface, the store, the middleware stack. Now let's see how it all comes together in the largest and most complex emulator in the project: GitHub. With 519 lines of entity types, 28 typed collections, a full search query parser, and webhook dispatch with HMAC signing, the GitHub emulator is the canonical example of the plugin pattern.

Walking through it end-to-end reveals patterns that every emulator follows, plus GitHub-specific behaviors that demonstrate how far the project goes to match production fidelity.

Entity Type Definitions

The foundation of any emulator is its entity types. The GitHub emulator defines 35 interfaces extending Entity across 519 lines:

packages/@emulators/github/src/entities.ts

erDiagram
    GitHubUser ||--o{ GitHubRepo : "owns"
    GitHubOrg ||--o{ GitHubRepo : "owns"
    GitHubOrg ||--o{ GitHubTeam : "has"
    GitHubTeam ||--o{ GitHubTeamMember : "contains"
    GitHubRepo ||--o{ GitHubIssue : "has"
    GitHubRepo ||--o{ GitHubPullRequest : "has"
    GitHubRepo ||--o{ GitHubBranch : "has"
    GitHubRepo ||--o{ GitHubCommit : "has"
    GitHubRepo ||--o{ GitHubCheckRun : "has"
    GitHubRepo ||--o{ GitHubWebhook : "has"
    GitHubRepo ||--o{ GitHubRelease : "has"
    GitHubApp ||--o{ GitHubAppInstallation : "installed via"
    GitHubIssue }o--o{ GitHubLabel : "tagged with"
    GitHubIssue }o--o| GitHubMilestone : "assigned to"

A few design decisions are worth highlighting:

Relationships are modeled through foreign key IDs, not nested objects. GitHubIssue has repo_id: number, user_id: number, assignee_ids: number[], and label_ids: number[]. The response formatters (in helpers.ts) resolve these IDs into nested objects at render time. This keeps entities flat and queryable.

Optional fields match the real API. GitHubPullRequest has merged_by_id: number | null, merge_commit_sha: string | null, closed_at: string | null. These are null until the PR is merged/closed, mirroring the real API's behavior of returning null for unfilled fields.

Computed fields are stored, not derived. GitHubRepo stores open_issues_count directly rather than computing it from the issues collection on every request. When an issue is opened or closed, the route handler updates this counter. This matches GitHub's denormalized approach and means list responses are fast.

The 28-Collection Store Facade

As we saw in Article 2, the GitHub store facade maps 28 named collections with carefully chosen indexes:

packages/@emulators/github/src/store.ts#L78-L119

The index selection follows the route structure:

Collection Indexes Why
repos owner_id, full_name List by owner, lookup by /:owner/:repo
issues repo_id, number List issues in repo, lookup by number
commits repo_id, sha List commits in repo, lookup by SHA
checkRuns repo_id, head_sha List check runs for a commit
appInstallations app_id, installation_id Look up by app or installation
oauthApps client_id OAuth token exchange

Every route starts by looking up a repo (by full_name), then queries sub-resources within that repo (by repo_id). The index design mirrors this access pattern.

Two-Phase Seeding: Defaults and Config

The GitHub plugin has two seeding phases, triggered in sequence by createServer():

Phase 1: seedDefaults() creates the minimum entities the emulator needs to function:

packages/@emulators/github/src/index.ts#L87-L133

Two users: ghost (GitHub's placeholder for deleted users) and admin (the default authenticated user). This mirrors GitHub's real behavior — the ghost user exists on github.com and appears as the author of actions by deleted accounts.

Phase 2: seedFromConfig() processes the YAML config to create entities and their relationships:

packages/@emulators/github/src/index.ts#L135-L373

sequenceDiagram
    participant Config as YAML Config
    participant Seed as seedFromConfig()
    participant Store as GitHubStore

    Config->>Seed: users, orgs, repos, apps
    Seed->>Store: Insert users (skip duplicates)
    Seed->>Store: Insert orgs (skip duplicates)
    Seed->>Seed: For each repo...
    Seed->>Store: Resolve owner (user or org)
    Seed->>Store: Insert repo
    Seed->>Store: Insert initial commit
    Seed->>Store: Insert tree with README.md
    Seed->>Store: Insert branch (default)
    Seed->>Store: Insert ref (refs/heads/main)
    Seed->>Store: Update owner.public_repos counter
    Seed->>Store: Insert OAuth apps
    Seed->>Store: Insert GitHub Apps + installations

The cascading entity creation for repos is the most complex part: a single repo config entry triggers creation of a commit, a tree (with a README blob reference), a branch, and a ref — the full git data model. The auto_init: false flag can suppress this if you want an empty repo.

The idempotency pattern is consistent throughout: const existing = gh.users.findOneBy("login", u.login); if (existing) continue;. This means re-seeding after a reset is safe — duplicate entities won't be created.

Route Handler Patterns

Route handlers follow a consistent pattern across all resource types. Here's the repo routes file as a canonical example:

packages/@emulators/github/src/routes/repos.ts#L1-L53

The standard flow for a read endpoint:

sequenceDiagram
    participant Client
    participant Route as Route Handler
    participant RH as Route Helpers
    participant Store as GitHubStore
    participant Fmt as Response Formatter

    Client->>Route: GET /repos/:owner/:repo
    Route->>Store: lookupRepo(owner, repo)
    Store-->>Route: GitHubRepo | undefined
    Route->>Route: Throw 404 if not found
    Route->>RH: assertRepoRead(gh, authUser, repo)
    RH->>RH: Check visibility + permissions
    Route->>Fmt: formatRepo(repo, store, baseUrl)
    Fmt->>Fmt: Resolve owner, compute permissions, build URLs
    Fmt-->>Client: JSON response with GitHub-compatible shape

The response formatters in helpers.ts are the unsung heroes. formatRepo() alone spans 100 lines, generating every URL template GitHub includes in its response (forks_url, keys_url, collaborators_url, etc.). These aren't just decoration — client libraries like Octokit follow these URLs.

The helper functions for node IDs and SHAs are simple but important:

export function generateNodeId(type: string, id: number): string {
  return Buffer.from(`0:${type}${id}`).toString("base64").replace(/=+$/, "");
}

export function generateSha(): string {
  return randomBytes(20).toString("hex");
}

Node IDs use a type:id format base64-encoded, matching GitHub's opaque ID pattern. SHAs are random 40-character hex strings — they're not real git hashes, but they're the right format.

Tip: When debugging a test failure against the emulator, check whether your code depends on the exact format of node_id values. Emulate generates valid-format IDs, but they won't match IDs from github.com.

Search Query Parser and Relevance Scoring

The GitHub search API is notoriously complex — it supports qualifiers like is:pr, language:TypeScript, stars:>100, range queries like stars:10..50, and negation with -qualifier:value. The emulator implements all of this:

packages/@emulators/github/src/routes/search.ts#L17-L143

The parser tokenizes the query string (handling quoted strings), then classifies each token:

flowchart TD
    A["q = 'react is:public stars:>100 -language:JavaScript'"] --> B[Tokenize]
    B --> C["['react', 'is:public', 'stars:>100', '-language:JavaScript']"]
    C --> D{For each token}
    D --> E{"Starts with '-' + has ':'?"}
    E -->|Yes| F[Add to negations map]
    E -->|No| G{"Has ':'?"}
    G -->|Yes| H{"Value is numeric range?"}
    H -->|Yes| I[Add to ranges map]
    H -->|No| J[Add to qualifiers map]
    G -->|No| K[Add to free text]
    F --> L["ParsedSearchQuery { text: 'react', qualifiers: {is: ['public']}, negations: {language: ['JavaScript']}, ranges: {stars: [{op: '>', value: 100}]} }"]
    I --> L
    J --> L
    K --> L

The relevance scoring function for repos is concise but weighted:

packages/@emulators/github/src/routes/search.ts#L323-L332

function repoRelevance(repo: GitHubRepo, parsed: ParsedSearchQuery): number {
  const t = parsed.text.trim().toLowerCase();
  if (!t) return 1;
  let score = 0;
  if (repo.name.toLowerCase().includes(t)) score += 5;
  if (repo.full_name.toLowerCase().includes(t)) score += 4;
  if (repo.description?.toLowerCase().includes(t)) score += 2;
  if (repo.topics.some((x) => x.toLowerCase().includes(t))) score += 1;
  return score;
}

Name matches score highest (5), full name matches next (4), description (2), topics (1). The search endpoints support six resource types: repositories, issues, users, code, commits, and topics — each with their own filter and relevance functions. The implementation spans over 1,100 lines.

Implicit Behaviors: Check Suites and Cascading State

One of the most valuable things about Emulate is that it replicates implicit API behaviors — things that happen automatically that aren't obvious from the API documentation.

When you POST a check run to GitHub, it automatically creates a check suite if one doesn't exist for that commit:

packages/@emulators/github/src/routes/checks.ts#L49-L72

function getOrCreateCheckSuite(
  gh: GitHubStore, repo: GitHubRepo, headSha: string, headBranch?: string | null,
): GitHubCheckSuite {
  const existing = gh.checkSuites.findBy("repo_id", repo.id).find((s) => s.head_sha === headSha);
  if (existing) return existing;
  
  const hb = headBranch?.trim() || headBranchForSha(gh, repo, headSha);
  const row = gh.checkSuites.insert({
    repo_id: repo.id, head_branch: hb, head_sha: headSha,
    status: "queued", conclusion: null, ...
  });
  return gh.checkSuites.get(row.id)!;
}

Similar cascading happens in repo creation (commit → tree → branch → ref), PR creation (creates a corresponding issue entry with is_pull_request: true), and issue state changes (updates open_issues_count on the repo).

Webhook Dispatch with Installation Enrichment

The GitHub plugin wraps the core WebhookDispatcher to add GitHub-specific behavior — finding relevant app installations and enriching payloads with installation data:

packages/@emulators/github/src/index.ts#L455-L500

sequenceDiagram
    participant Route as Route Handler
    participant WH as Wrapped dispatch()
    participant Find as findInstallationsForRepo()
    participant Core as Original dispatch()
    participant Deliver as deliverToAppWebhookUrls()

    Route->>WH: dispatch("push", undefined, payload, "octocat", "hello-world")
    WH->>Find: Find matching installations
    Find-->>WH: [installation_100]
    WH->>WH: Enrich payload with installation data
    WH->>Core: dispatch(event, action, enrichedPayload, ...)
    Note over Core: Delivers to repo/org webhook URLs
    WH->>Deliver: Deliver to app webhook URLs
    Deliver->>Deliver: HMAC-SHA256 sign with webhook_secret
    Deliver->>Deliver: POST to app's webhook_url

The installation matching logic checks:

  1. Is the installation on the correct account (user/org)?
  2. Is it not suspended?
  3. Does the app subscribe to this event type?
  4. For "selected" repository installations — is this repo selected?

The HMAC signing uses createHmac("sha256", webhookSecret) from Node's crypto module, setting the X-Hub-Signature-256 header. This means your webhook handler's signature verification code works unchanged against the emulator.

Tip: If your app uses GitHub App authentication, you need to configure the apps section in the seed config with a real RSA private key. The emulator actually verifies JWT signatures, so a placeholder key won't work.

What's Next

We've dissected the most complex emulator in the project. In the next article, we'll zoom in on one specific cross-cutting concern: OAuth and OpenID Connect. Six of the twelve emulators implement identity flows, and they share patterns — pending code maps, session cookies, a user picker UI — while diverging in protocol-specific ways. We'll also see how the Next.js adapter solves the OAuth callback URL problem for Vercel preview deployments.