Read OSS

Two Faces of Gemini CLI: The React/Ink Terminal UI and the Programmatic SDK

Intermediate

Prerequisites

  • Article 1: Architecture and Navigation Guide
  • Article 2: The Agentic Loop
  • Basic React knowledge
  • Understanding of async iterables

Two Faces of Gemini CLI: The React/Ink Terminal UI and the Programmatic SDK

Gemini CLI's architecture — with its clean core/CLI split, typed event streams, and protocol interfaces — enables two very different interaction surfaces built on the same backend. The interactive terminal UI renders a rich React application inside your terminal with context providers, keyboard handling, and streaming content display. The SDK wraps the same engine into a programmatic API for embedding in other tools. This article explores both.

The React/Ink Interactive UI

Gemini CLI's terminal UI is a full React application rendered via Ink, a library that maps React components to terminal output. The UI is loaded lazily — as we saw in Article 1, startInteractiveUI in gemini.tsx line 167–185 dynamically imports the heavy module:

export async function startInteractiveUI(/* ... */) {
  const { startInteractiveUI: doStartUI } = await import('./interactiveCli.js');
  await doStartUI(config, settings, startupWarnings, workspaceRoot, ...);
}

This keeps the non-interactive path fast. When interactive mode launches, startInteractiveUI in interactiveCli.tsx sets up mouse events, loads key matchers, patches console output, creates working stdio streams, and renders the React tree.

graph TD
    subgraph "Terminal Setup"
        ME[Enable mouse events]
        KM[Load key matchers]
        CP[Patch console]
        WS[Create working stdio]
    end
    
    subgraph "React Tree"
        AW[AppWrapper]
        AW --> SC[SettingsContext.Provider]
        SC --> KMP[KeyMatchersProvider]
        KMP --> KP[KeypressProvider]
        KP --> MP[MouseProvider]
        MP --> TP[TerminalProvider]
        TP --> SP[ScrollProvider]
        SP --> OP[OverflowProvider]
        OP --> SSP[SessionStatsProvider]
        SSP --> VMP[VimModeProvider]
        VMP --> AC[AppContainer]
    end
    
    ME --> AW
    KM --> AW
    CP --> AW
    WS --> AW

Context Provider Hierarchy

The UI uses a deep hierarchy of React context providers to manage state. Looking at the AppWrapper component in interactiveCli.tsx lines 100–120:

classDiagram
    class SettingsContext {
        LoadedSettings
        Multi-scope settings
    }
    class KeyMatchersProvider {
        Key binding definitions
        Custom shortcuts
    }
    class KeypressProvider {
        Priority-based key handling
        useKeypress hook
    }
    class MouseProvider {
        Mouse event state
        Click and scroll tracking
    }
    class TerminalProvider {
        Terminal dimensions
        Capability detection
    }
    class ScrollProvider {
        Scroll position
        Virtual scroll management
    }
    class OverflowProvider {
        Content overflow detection
        Truncation management
    }
    class SessionStatsProvider {
        Token counts
        Turn statistics
    }
    class VimModeProvider {
        Vi keybinding mode
        Normal/insert state
    }
    
    SettingsContext --> KeyMatchersProvider
    KeyMatchersProvider --> KeypressProvider
    KeypressProvider --> MouseProvider
    MouseProvider --> TerminalProvider
    TerminalProvider --> ScrollProvider
    ScrollProvider --> OverflowProvider
    OverflowProvider --> SessionStatsProvider
    SessionStatsProvider --> VimModeProvider

The AppContainer component at the bottom of this hierarchy is where the real work happens. It's a massive component (hundreds of lines) that manages:

  • Authentication state and auth command processing
  • The Gemini stream hook (useGeminiStream) for processing events
  • History management with useHistory
  • Slash command processing
  • Confirmation request handling
  • Quota and fallback management
  • UI state like streaming status, input focus, and tool actions

AppContainer wraps all of this in additional context providers (AppContext, UIStateContext, UIActionsContext, ConfigContext, ToolActionsProvider) before rendering the actual App component.

Tip: When debugging UI issues, the ConfigContext and UIStateContext are the most commonly needed. ConfigContext provides the core Config object, while UIStateContext holds mutable state like StreamingState, current confirmation request, and history items.

Streaming Events to React State

The bridge between the agentic loop (Article 2) and the React UI is the useGeminiStream hook. This hook subscribes to ServerGeminiStreamEvent values from GeminiClient.sendMessageStream() and translates them into React state updates.

sequenceDiagram
    participant User as User Input
    participant AC as AppContainer
    participant GS as useGeminiStream
    participant GC as GeminiClient
    participant UI as React Components
    
    User->>AC: Submit prompt
    AC->>GS: sendMessage(prompt)
    GS->>GC: sendMessageStream()
    
    loop For each event
        GC-->>GS: ServerGeminiStreamEvent
        alt Content event
            GS->>UI: Update streaming text
        else ToolCallRequest
            GS->>UI: Show tool call in progress
        else ToolCallConfirmation
            GS->>UI: Show confirmation dialog
        else Thought event
            GS->>UI: Update thinking indicator
        else Error event
            GS->>UI: Display error
        end
    end
    
    GC-->>GS: Finished event
    GS->>UI: Set streaming state to idle

The hook manages the StreamingState enum transitions:

  • IdleStreaming when a message is sent
  • StreamingIdle when the response completes
  • StreamingCancelled when the user aborts

Tool call requests from the stream are dispatched to the Scheduler (as covered in Article 3). When the scheduler needs user confirmation, it publishes through the MessageBus, and the UI listens for TOOL_CONFIRMATION_REQUEST events to render the appropriate confirmation dialog.

Non-Interactive Mode

When stdin is piped or the --prompt flag is used, Gemini CLI skips React entirely and runs in non-interactive mode. The runNonInteractive() function at packages/cli/src/nonInteractiveCli.ts provides a simpler event consumption path:

flowchart TD
    A[main() detects non-interactive] --> B[Read stdin if piped]
    B --> C[Fire SessionStart hook]
    C --> D[Log user prompt telemetry]
    D --> E[runNonInteractive()]
    E --> F[Consume ServerGeminiStreamEvent]
    F --> G[Write Content to stdout]
    F --> H[Auto-confirm tools per policy]
    F --> I[Write errors to stderr]
    G --> J[Exit on Finished]

In non-interactive mode, the output listeners at gemini.tsx lines 720–754 route coreEvents to stdout and stderr directly. There's no React rendering, no scroll management, no key handling — just event-driven output.

Without a UI to show confirmation dialogs, non-interactive mode relies entirely on the policy engine and approval mode. In YOLO mode, all tools auto-approve. In DEFAULT mode, tools requiring confirmation are auto-denied (since there's no user to ask). This is controlled by the MessageBus's listener check — if no TOOL_CONFIRMATION_REQUEST listeners are registered, it immediately responds with confirmed: false, requiresUserConfirmation: true.

The Agent Protocol and SDK

The SDK at packages/sdk/ provides a programmatic API built on the AgentProtocol interface:

export interface AgentProtocol extends Trajectory {
  send(payload: AgentSend): Promise<{ streamId: string | null }>;
  subscribe(callback: (event: AgentEvent) => void): Unsubscribe;
  abort(): Promise<void>;
  readonly events: readonly AgentEvent[];
}

The three methods define the entire contract:

  • send() — Send data to the agent (messages, elicitation responses, config updates, actions). Returns the streamId for correlation.
  • subscribe() — Listen to all agent events. Returns an unsubscribe function.
  • abort() — Cancel the current agent activity.

AgentSession wraps AgentProtocol with a more convenient API, adding sendStream() as an AsyncIterable:

async *sendStream(payload: AgentSend): AsyncIterable<AgentEvent> {
    const result = await this._protocol.send(payload);
    const streamId = result.streamId;
    if (streamId === null) return;
    yield* this.stream({ streamId });
}

This lets consumers use for await...of loops to process agent events:

const session = agent.session();
for await (const event of session.sendStream({ message: [{ text: "Fix the bug" }] })) {
    switch (event.type) {
        case 'content': console.log(event.text); break;
        case 'agent_end': console.log('Done'); break;
    }
}

The SDK's entry point, GeminiCliAgent, manages sessions:

const agent = new GeminiCliAgent({ cwd: '/path/to/project' });
const session = agent.session();
// Or resume a previous session:
const resumed = await agent.resumeSession(previousSessionId);
graph TD
    subgraph SDK Package
        GCA[GeminiCliAgent<br/>Session factory]
        GCS[GeminiCliSession<br/>Session lifecycle]
    end
    
    subgraph Core Package
        AP[AgentProtocol<br/>send/subscribe/abort]
        AS[AgentSession<br/>AsyncIterable wrapper]
        ET[Event Translator<br/>ServerGeminiStreamEvent → AgentEvent]
    end
    
    GCA --> GCS
    GCS --> AP
    AP --> AS
    AS --> ET

The Event Translator layer converts between the internal ServerGeminiStreamEvent union (18 types, as covered in Article 2) and the SDK's AgentEvent union, which uses a different taxonomy optimized for external consumers.

A2A Server and VS Code Extension

Two additional packages build on the SDK:

packages/a2a-server — An experimental Agent-to-Agent protocol server that exposes Gemini CLI capabilities over HTTP. Other agents can send prompts, receive streaming responses, and invoke tools through a standardized API. This follows Google's A2A protocol specification for inter-agent communication.

packages/vscode-ide-companion — A VS Code extension that provides tight editor integration. It communicates IDE context (open files, cursor position, selected text) to the Gemini CLI backend via the ideContextStore. As we saw in Article 2's GeminiClient.getIdeContextParts(), this context is injected into the conversation as structured JSON, enabling the model to understand what you're looking at in your editor.

Tip: When building on the SDK, prefer AgentSession.sendStream() over manually calling send() + subscribe(). The sendStream() method handles stream tracking, event filtering by streamId, and proper cleanup via the async iterator protocol.

Series Conclusion

Over these six articles, we've traced the complete path through Gemini CLI's codebase:

  1. Architecture — The 7-package monorepo, the Config god object, and the dual event systems
  2. The Agentic Loop — GeminiClient → Turn → GeminiChat with streaming events, compression, and loop detection
  3. Tools and Scheduler — The builder pattern for tool definitions and event-driven orchestration
  4. Security — Policy engine rules, platform sandboxing, and pluggable safety checkers
  5. Extensibility — Hooks, skills, MCP, extensions with HMAC integrity, and model routing strategies
  6. UI and SDK — React/Ink terminal rendering and the programmatic AgentProtocol

The design philosophy throughout is one of layered abstraction with clear contracts. The ServerGeminiStreamEvent union connects the backend to any frontend. The AgentLoopContext interface scopes execution contexts. The ToolInvocation lifecycle standardizes tool execution. And the MessageBus mediates between autonomous agent decisions and user control.

Whether you're contributing to Gemini CLI, building extensions, or embedding it via the SDK, understanding these layers gives you the map you need to navigate confidently.