Gemini CLI Architecture: A Map of the Monorepo
Prerequisites
- ›Basic TypeScript knowledge
- ›Familiarity with Node.js project structure
- ›Understanding of monorepo concepts
Gemini CLI Architecture: A Map of the Monorepo
Google's Gemini CLI is an open-source agentic coding assistant that brings Gemini models into your terminal. It can edit files, run shell commands, search the web, and integrate with MCP servers — all orchestrated by a sophisticated TypeScript codebase. But before you can contribute to or learn from Gemini CLI, you need a mental map. This article provides exactly that: a guided tour through the monorepo structure, the startup sequence, the central configuration object, and the two event systems that wire everything together.
The 7-Package Monorepo Layout
Gemini CLI uses npm workspaces to organize its code into seven packages, each with a clearly defined responsibility boundary.
| Package | Purpose |
|---|---|
packages/core |
Backend logic: API client, tools, policy engine, scheduler, hooks, MCP, safety |
packages/cli |
React/Ink terminal UI, interactive and non-interactive modes |
packages/sdk |
Programmatic API for embedding Gemini CLI in other applications |
packages/a2a-server |
Experimental Agent-to-Agent protocol server |
packages/devtools |
Browser-based debug inspector |
packages/vscode-ide-companion |
Visual Studio Code extension for IDE integration |
packages/test-utils |
Shared test infrastructure |
The barrel export in packages/core/src/index.ts reveals just how much lives in core — over 270 lines of reexports covering config, policy, tools, scheduling, MCP, hooks, agents, telemetry, and more. It's the gravitational center of the codebase.
graph TD
subgraph Consumers
CLI[packages/cli<br/>Terminal UI]
SDK[packages/sdk<br/>Programmatic API]
A2A[packages/a2a-server<br/>Agent-to-Agent]
DEV[packages/devtools<br/>Debug Inspector]
VSC[packages/vscode-ide-companion<br/>VS Code Extension]
end
CORE[packages/core<br/>Backend Logic]
TEST[packages/test-utils<br/>Test Infrastructure]
CLI --> CORE
SDK --> CORE
A2A --> SDK
DEV --> CORE
VSC --> CORE
CLI -.-> TEST
CORE -.-> TEST
Tip: When exploring Gemini CLI for the first time, start in
packages/core/src/index.ts. The import groupings — config, core logic, tools, services, hooks — mirror the actual architectural layers.
The Core ↔ CLI Split
The most important architectural decision in Gemini CLI is the clean separation between core (the headless backend) and cli (the terminal UI). This split enables three different interaction surfaces to share the same backend:
- The interactive CLI — a React/Ink application rendering in the terminal
- The non-interactive CLI — for piped input and CI/CD
- The SDK — for programmatic embedding via
GeminiCliAgent
The SDK package at packages/sdk/src/index.ts is remarkably slim — just five reexports. All the heavy lifting happens in core's agent session, tool registry, and client layers. The SDK wraps these into a friendlier GeminiCliAgent / GeminiCliSession API.
This split has a concrete performance benefit too. The CLI lazily imports React/Ink only when entering interactive mode, keeping the non-interactive path fast.
The Boot Sequence: From Shebang to Interactive Mode
Understanding the startup flow is essential for navigating Gemini CLI. The entry point at packages/cli/index.ts is deceptively simple: it installs a global uncaught exception handler (notably suppressing a known node-pty race condition on Windows) and calls main().
The real complexity lives in packages/cli/src/gemini.tsx. The main() function orchestrates a multi-phase startup:
flowchart TD
A[Entry: index.ts] --> B[main()]
B --> C[Setup handlers & patch stdio]
C --> D[Load settings & worktree]
D --> E[Parse arguments]
E --> F[Configure DNS & auth type]
F --> G[Load CLI config (partial)]
G --> H[Refresh auth / admin settings]
H --> I{Sandbox needed?}
I -- Yes --> J[Relaunch in sandbox]
I -- No --> K[Relaunch in child process]
J --> L[Exit parent]
K --> M[Full config load]
M --> N{Interactive?}
N -- Yes --> O[startInteractiveUI()]
N -- No --> P[runNonInteractive()]
Several design choices stand out:
Two-phase config loading. The config is loaded twice — once partially before the sandbox decision (line 330), and once fully after entering the sandbox (line 454). This is because auth must happen before sandboxing (the sandbox blocks OAuth redirects), but extensions should not be loaded until after sandboxing.
The sandbox relaunch. If sandboxing is enabled and we're not already in a sandbox (!process.env['SANDBOX']), the process relaunches itself inside a container or macOS seatbelt. This is invisible to the user but critical for security.
Lazy UI loading. The startInteractiveUI function at line 167–185 dynamically imports the heavy interactiveCli.js module, keeping React/Ink out of the dependency graph for non-interactive runs.
The Config God Object and AgentLoopContext
At the heart of every Gemini CLI session sits the Config class — a ~3700-line object that implements both the McpContext and AgentLoopContext interfaces. Defined at packages/core/src/config/config.ts#L736, it holds:
- Tool, prompt, and resource registries
- MCP and A2A client managers
- The sandbox manager and policy engine
- The model router service and content generator
- The hook system, skill manager, and file discovery service
- Session state, telemetry settings, IDE mode, and much more
classDiagram
class Config {
+toolRegistry: ToolRegistry
+mcpClientManager: McpClientManager
+sandboxManager: SandboxManager
+modelRouterService: ModelRouterService
+policyEngine: PolicyEngine
+hookSystem: HookSystem
+skillManager: SkillManager
+contentGenerator: ContentGenerator
+getMessageBus(): MessageBus
+getGeminiClient(): GeminiClient
+initialize(): Promise
}
class AgentLoopContext {
<<interface>>
+config: Config
+promptId: string
+parentSessionId?: string
+toolRegistry: ToolRegistry
+promptRegistry: PromptRegistry
+resourceRegistry: ResourceRegistry
+messageBus: MessageBus
+geminiClient: GeminiClient
+sandboxManager: SandboxManager
}
Config ..|> AgentLoopContext
Config ..|> McpContext
The AgentLoopContext interface provides a scoped view for a single agent turn. Instead of passing the entire Config to components like GeminiClient or Scheduler, you pass an AgentLoopContext with just the registries, message bus, and sandbox manager relevant to that execution scope. This is especially important for sub-agents, where each gets its own derived context.
Tip: When reading code that accepts an
AgentLoopContext, remember thatcontext.configalways gives you back the full Config object. The interface is a narrowing convention, not a hard boundary.
Dual Event Systems: coreEvents and MessageBus
Gemini CLI uses two distinct event systems for different communication patterns.
CoreEventEmitter — Global Cross-Cutting Concerns
The coreEvents singleton is a typed EventEmitter that handles global notifications. Its CoreEvent enum defines events like UserFeedback, ModelChanged, ConsoleLog, RetryAttempt, McpProgress, and QuotaChanged.
A clever feature: the CoreEventEmitter implements event backlogging. If an event is emitted before any listeners are subscribed (common during startup), it's queued in a backlog (up to 10,000 entries) and drained when listeners appear. This prevents early startup warnings from being lost.
MessageBus — Tool Confirmation Pub/Sub
The MessageBus handles a different pattern: request-response communication between the scheduler and the UI for tool confirmation. When a tool call needs user approval, the scheduler publishes a TOOL_CONFIRMATION_REQUEST. The bus checks policy (allow/deny/ask_user) and either auto-resolves or forwards to the UI.
flowchart LR
S[Scheduler] -->|TOOL_CONFIRMATION_REQUEST| MB[MessageBus]
MB -->|PolicyEngine.check| PE{Policy Decision}
PE -- ALLOW --> AR[Auto-resolve confirmed]
PE -- DENY --> AD[Auto-resolve denied]
PE -- ASK_USER --> UI[Forward to UI]
UI -->|TOOL_CONFIRMATION_RESPONSE| MB
MB --> S
The derive() method at line 46–72 creates namespaced child buses for sub-agents. A derived bus prefixes the sub-agent name to confirmation requests, ensuring parent and child agent flows don't interfere with each other while sharing the same underlying event infrastructure.
Navigating the Codebase: Where to Start Reading
Here's a practical reading order for new contributors:
| Step | File | Why |
|---|---|---|
| 1 | packages/cli/src/gemini.tsx |
Understand the boot sequence |
| 2 | packages/core/src/config/config.ts |
See what the Config holds |
| 3 | packages/core/src/core/client.ts |
The agentic loop orchestrator |
| 4 | packages/core/src/core/turn.ts |
How streaming events work |
| 5 | packages/core/src/scheduler/scheduler.ts |
How tools are executed |
| 6 | packages/core/src/policy/policy-engine.ts |
Security model |
| 7 | packages/core/src/tools/definitions/coreTools.ts |
Tool schema definitions |
Key naming conventions to know:
core/— the agentic loop layer (client, turn, chat, prompts)tools/definitions/— tool schemas split by model familyscheduler/— tool execution orchestrationconfirmation-bus/— the MessageBus systemhooks/— five-component hook systemmcp/— MCP client and OAuthpolicy/— the rule-based policy engine
The relationship between tool definitions and invocations follows a builder pattern: coreTools.ts defines declarative schemas via BaseDeclarativeTool, and the scheduler instantiates ToolInvocation objects for actual execution. We'll explore this fully in Article 3.
In the next article, we'll dive deep into the agentic loop — the three-layer architecture of GeminiClient, Turn, and GeminiChat that transforms user prompts into streaming tool calls and responses.