Anatomy of a Service Emulator: GitHub from Entities to Routes
Prerequisites
- ›Articles 1-3
- ›Familiarity with the GitHub REST API
Anatomy of a Service Emulator: GitHub from Entities to Routes
We've covered the infrastructure — the plugin interface, the store, the middleware stack. Now let's see how it all comes together in the largest and most complex emulator in the project: GitHub. With 519 lines of entity types, 28 typed collections, a full search query parser, and webhook dispatch with HMAC signing, the GitHub emulator is the canonical example of the plugin pattern.
Walking through it end-to-end reveals patterns that every emulator follows, plus GitHub-specific behaviors that demonstrate how far the project goes to match production fidelity.
Entity Type Definitions
The foundation of any emulator is its entity types. The GitHub emulator defines 35 interfaces extending Entity across 519 lines:
packages/@emulators/github/src/entities.ts
erDiagram
GitHubUser ||--o{ GitHubRepo : "owns"
GitHubOrg ||--o{ GitHubRepo : "owns"
GitHubOrg ||--o{ GitHubTeam : "has"
GitHubTeam ||--o{ GitHubTeamMember : "contains"
GitHubRepo ||--o{ GitHubIssue : "has"
GitHubRepo ||--o{ GitHubPullRequest : "has"
GitHubRepo ||--o{ GitHubBranch : "has"
GitHubRepo ||--o{ GitHubCommit : "has"
GitHubRepo ||--o{ GitHubCheckRun : "has"
GitHubRepo ||--o{ GitHubWebhook : "has"
GitHubRepo ||--o{ GitHubRelease : "has"
GitHubApp ||--o{ GitHubAppInstallation : "installed via"
GitHubIssue }o--o{ GitHubLabel : "tagged with"
GitHubIssue }o--o| GitHubMilestone : "assigned to"
A few design decisions are worth highlighting:
Relationships are modeled through foreign key IDs, not nested objects. GitHubIssue has repo_id: number, user_id: number, assignee_ids: number[], and label_ids: number[]. The response formatters (in helpers.ts) resolve these IDs into nested objects at render time. This keeps entities flat and queryable.
Optional fields match the real API. GitHubPullRequest has merged_by_id: number | null, merge_commit_sha: string | null, closed_at: string | null. These are null until the PR is merged/closed, mirroring the real API's behavior of returning null for unfilled fields.
Computed fields are stored, not derived. GitHubRepo stores open_issues_count directly rather than computing it from the issues collection on every request. When an issue is opened or closed, the route handler updates this counter. This matches GitHub's denormalized approach and means list responses are fast.
The 28-Collection Store Facade
As we saw in Article 2, the GitHub store facade maps 28 named collections with carefully chosen indexes:
packages/@emulators/github/src/store.ts#L78-L119
The index selection follows the route structure:
| Collection | Indexes | Why |
|---|---|---|
repos |
owner_id, full_name |
List by owner, lookup by /:owner/:repo |
issues |
repo_id, number |
List issues in repo, lookup by number |
commits |
repo_id, sha |
List commits in repo, lookup by SHA |
checkRuns |
repo_id, head_sha |
List check runs for a commit |
appInstallations |
app_id, installation_id |
Look up by app or installation |
oauthApps |
client_id |
OAuth token exchange |
Every route starts by looking up a repo (by full_name), then queries sub-resources within that repo (by repo_id). The index design mirrors this access pattern.
Two-Phase Seeding: Defaults and Config
The GitHub plugin has two seeding phases, triggered in sequence by createServer():
Phase 1: seedDefaults() creates the minimum entities the emulator needs to function:
packages/@emulators/github/src/index.ts#L87-L133
Two users: ghost (GitHub's placeholder for deleted users) and admin (the default authenticated user). This mirrors GitHub's real behavior — the ghost user exists on github.com and appears as the author of actions by deleted accounts.
Phase 2: seedFromConfig() processes the YAML config to create entities and their relationships:
packages/@emulators/github/src/index.ts#L135-L373
sequenceDiagram
participant Config as YAML Config
participant Seed as seedFromConfig()
participant Store as GitHubStore
Config->>Seed: users, orgs, repos, apps
Seed->>Store: Insert users (skip duplicates)
Seed->>Store: Insert orgs (skip duplicates)
Seed->>Seed: For each repo...
Seed->>Store: Resolve owner (user or org)
Seed->>Store: Insert repo
Seed->>Store: Insert initial commit
Seed->>Store: Insert tree with README.md
Seed->>Store: Insert branch (default)
Seed->>Store: Insert ref (refs/heads/main)
Seed->>Store: Update owner.public_repos counter
Seed->>Store: Insert OAuth apps
Seed->>Store: Insert GitHub Apps + installations
The cascading entity creation for repos is the most complex part: a single repo config entry triggers creation of a commit, a tree (with a README blob reference), a branch, and a ref — the full git data model. The auto_init: false flag can suppress this if you want an empty repo.
The idempotency pattern is consistent throughout: const existing = gh.users.findOneBy("login", u.login); if (existing) continue;. This means re-seeding after a reset is safe — duplicate entities won't be created.
Route Handler Patterns
Route handlers follow a consistent pattern across all resource types. Here's the repo routes file as a canonical example:
packages/@emulators/github/src/routes/repos.ts#L1-L53
The standard flow for a read endpoint:
sequenceDiagram
participant Client
participant Route as Route Handler
participant RH as Route Helpers
participant Store as GitHubStore
participant Fmt as Response Formatter
Client->>Route: GET /repos/:owner/:repo
Route->>Store: lookupRepo(owner, repo)
Store-->>Route: GitHubRepo | undefined
Route->>Route: Throw 404 if not found
Route->>RH: assertRepoRead(gh, authUser, repo)
RH->>RH: Check visibility + permissions
Route->>Fmt: formatRepo(repo, store, baseUrl)
Fmt->>Fmt: Resolve owner, compute permissions, build URLs
Fmt-->>Client: JSON response with GitHub-compatible shape
The response formatters in helpers.ts are the unsung heroes. formatRepo() alone spans 100 lines, generating every URL template GitHub includes in its response (forks_url, keys_url, collaborators_url, etc.). These aren't just decoration — client libraries like Octokit follow these URLs.
The helper functions for node IDs and SHAs are simple but important:
export function generateNodeId(type: string, id: number): string {
return Buffer.from(`0:${type}${id}`).toString("base64").replace(/=+$/, "");
}
export function generateSha(): string {
return randomBytes(20).toString("hex");
}
Node IDs use a type:id format base64-encoded, matching GitHub's opaque ID pattern. SHAs are random 40-character hex strings — they're not real git hashes, but they're the right format.
Tip: When debugging a test failure against the emulator, check whether your code depends on the exact format of
node_idvalues. Emulate generates valid-format IDs, but they won't match IDs from github.com.
Search Query Parser and Relevance Scoring
The GitHub search API is notoriously complex — it supports qualifiers like is:pr, language:TypeScript, stars:>100, range queries like stars:10..50, and negation with -qualifier:value. The emulator implements all of this:
packages/@emulators/github/src/routes/search.ts#L17-L143
The parser tokenizes the query string (handling quoted strings), then classifies each token:
flowchart TD
A["q = 'react is:public stars:>100 -language:JavaScript'"] --> B[Tokenize]
B --> C["['react', 'is:public', 'stars:>100', '-language:JavaScript']"]
C --> D{For each token}
D --> E{"Starts with '-' + has ':'?"}
E -->|Yes| F[Add to negations map]
E -->|No| G{"Has ':'?"}
G -->|Yes| H{"Value is numeric range?"}
H -->|Yes| I[Add to ranges map]
H -->|No| J[Add to qualifiers map]
G -->|No| K[Add to free text]
F --> L["ParsedSearchQuery { text: 'react', qualifiers: {is: ['public']}, negations: {language: ['JavaScript']}, ranges: {stars: [{op: '>', value: 100}]} }"]
I --> L
J --> L
K --> L
The relevance scoring function for repos is concise but weighted:
packages/@emulators/github/src/routes/search.ts#L323-L332
function repoRelevance(repo: GitHubRepo, parsed: ParsedSearchQuery): number {
const t = parsed.text.trim().toLowerCase();
if (!t) return 1;
let score = 0;
if (repo.name.toLowerCase().includes(t)) score += 5;
if (repo.full_name.toLowerCase().includes(t)) score += 4;
if (repo.description?.toLowerCase().includes(t)) score += 2;
if (repo.topics.some((x) => x.toLowerCase().includes(t))) score += 1;
return score;
}
Name matches score highest (5), full name matches next (4), description (2), topics (1). The search endpoints support six resource types: repositories, issues, users, code, commits, and topics — each with their own filter and relevance functions. The implementation spans over 1,100 lines.
Implicit Behaviors: Check Suites and Cascading State
One of the most valuable things about Emulate is that it replicates implicit API behaviors — things that happen automatically that aren't obvious from the API documentation.
When you POST a check run to GitHub, it automatically creates a check suite if one doesn't exist for that commit:
packages/@emulators/github/src/routes/checks.ts#L49-L72
function getOrCreateCheckSuite(
gh: GitHubStore, repo: GitHubRepo, headSha: string, headBranch?: string | null,
): GitHubCheckSuite {
const existing = gh.checkSuites.findBy("repo_id", repo.id).find((s) => s.head_sha === headSha);
if (existing) return existing;
const hb = headBranch?.trim() || headBranchForSha(gh, repo, headSha);
const row = gh.checkSuites.insert({
repo_id: repo.id, head_branch: hb, head_sha: headSha,
status: "queued", conclusion: null, ...
});
return gh.checkSuites.get(row.id)!;
}
Similar cascading happens in repo creation (commit → tree → branch → ref), PR creation (creates a corresponding issue entry with is_pull_request: true), and issue state changes (updates open_issues_count on the repo).
Webhook Dispatch with Installation Enrichment
The GitHub plugin wraps the core WebhookDispatcher to add GitHub-specific behavior — finding relevant app installations and enriching payloads with installation data:
packages/@emulators/github/src/index.ts#L455-L500
sequenceDiagram
participant Route as Route Handler
participant WH as Wrapped dispatch()
participant Find as findInstallationsForRepo()
participant Core as Original dispatch()
participant Deliver as deliverToAppWebhookUrls()
Route->>WH: dispatch("push", undefined, payload, "octocat", "hello-world")
WH->>Find: Find matching installations
Find-->>WH: [installation_100]
WH->>WH: Enrich payload with installation data
WH->>Core: dispatch(event, action, enrichedPayload, ...)
Note over Core: Delivers to repo/org webhook URLs
WH->>Deliver: Deliver to app webhook URLs
Deliver->>Deliver: HMAC-SHA256 sign with webhook_secret
Deliver->>Deliver: POST to app's webhook_url
The installation matching logic checks:
- Is the installation on the correct account (user/org)?
- Is it not suspended?
- Does the app subscribe to this event type?
- For "selected" repository installations — is this repo selected?
The HMAC signing uses createHmac("sha256", webhookSecret) from Node's crypto module, setting the X-Hub-Signature-256 header. This means your webhook handler's signature verification code works unchanged against the emulator.
Tip: If your app uses GitHub App authentication, you need to configure the
appssection in the seed config with a real RSA private key. The emulator actually verifies JWT signatures, so a placeholder key won't work.
What's Next
We've dissected the most complex emulator in the project. In the next article, we'll zoom in on one specific cross-cutting concern: OAuth and OpenID Connect. Six of the twelve emulators implement identity flows, and they share patterns — pending code maps, session cookies, a user picker UI — while diverging in protocol-specific ways. We'll also see how the Next.js adapter solves the OAuth callback URL problem for Vercel preview deployments.