Inside the Agentic Loop: How Gemini CLI Processes a Prompt
Prerequisites
- ›Article 1: Architecture and Navigation Guide
- ›TypeScript async generators and iterators
- ›Basic understanding of LLM tool calling and streaming
Inside the Agentic Loop: How Gemini CLI Processes a Prompt
When you type a prompt into Gemini CLI, it enters a loop that can span dozens of turns — sending requests to the model, receiving streaming responses, dispatching tool calls, feeding results back, and repeating until the task is complete or the model signals it's done. This isn't a simple request-response cycle; it's a three-layer architecture with streaming events, retry logic, compression, loop detection, and hook integration at every stage.
This article traces the complete lifecycle of a user prompt through GeminiClient, Turn, and GeminiChat.
GeminiClient: The Outer Loop Orchestrator
The GeminiClient class is the entry point for all LLM interactions. It receives an AgentLoopContext (as we covered in Article 1) and manages the conversation lifecycle including loop detection, chat compression, tool output masking, and hook firing.
The core method is sendMessageStream() at line 881. It's an AsyncGenerator<ServerGeminiStreamEvent, Turn> — meaning it yields streaming events to callers while internally managing multiple model turns.
sequenceDiagram
participant UI as UI Layer
participant GC as GeminiClient
participant T as Turn
participant Chat as GeminiChat
participant API as Gemini API
UI->>GC: sendMessageStream(request, signal, prompt_id)
GC->>GC: fireBeforeAgentHook
GC->>GC: processTurn()
GC->>T: new Turn(chat, prompt_id)
T->>Chat: sendMessageStream()
Chat->>API: generateContentStream()
API-->>Chat: streaming chunks
Chat-->>T: StreamEvent yields
T-->>GC: ServerGeminiStreamEvent yields
GC-->>UI: ServerGeminiStreamEvent yields
GC->>GC: fireAfterAgentHook
Here's what happens in sendMessageStream():
- Hook state management — It fires
BeforeAgenthooks and checks if execution should be stopped or blocked - Turn processing — It delegates to
processTurn(), which creates aTurnand manages model routing - Loop detection — Events from the turn are checked against the
LoopDetectionService - Post-turn hooks —
AfterAgenthooks can stop execution, clear context, or trigger continuation turns - State cleanup — Hook state is cleaned up in a
finallyblock
The method is recursive — if an AfterAgent hook blocks execution with a reason, sendMessageStream calls itself with the block reason as a new prompt and stopHookActive: true.
Turn: The AsyncGenerator Heart
Each call to the model within the agentic loop is wrapped in a Turn instance. A Turn is responsible for converting raw API streaming responses into typed ServerGeminiStreamEvent values.
The run() method at line 253 is itself an AsyncGenerator:
async *run(
modelConfigKey: ModelConfigKey,
req: PartListUnion,
signal: AbortSignal,
displayContent?: PartListUnion,
role: LlmRole = LlmRole.MAIN,
): AsyncGenerator<ServerGeminiStreamEvent>
It iterates over the GeminiChat.sendMessageStream() response, pattern-matching on stream event types:
retryevents → yielded asGeminiEventType.Retryso the UI can discard partial contentagent_execution_stopped/blocked→ yielded as hook-driven control eventschunkevents → parsed for thoughts, content text, tool calls, citations, and finish reasons
The Turn accumulates pendingToolCalls — function calls from the model that need to be dispatched via the Scheduler. These are extracted from functionCall parts in the streaming response.
stateDiagram-v2
[*] --> Created: new Turn()
Created --> Running: run() called
Running --> YieldingContent: Content parts received
Running --> YieldingThought: Thought parts received
Running --> CollectingToolCalls: FunctionCall parts received
YieldingContent --> Running: continue iteration
YieldingThought --> Running: continue iteration
CollectingToolCalls --> Running: continue iteration
Running --> Finished: FinishReason received
Running --> Cancelled: signal.aborted
Running --> Error: API error
Finished --> [*]
Cancelled --> [*]
Error --> [*]
Tip: The
pendingToolCallson a Turn are critical — aftersendMessageStreamyields all events from a turn, the caller (typically the UI or SDK) checksturn.pendingToolCallsto dispatch tool execution through the Scheduler. The agentic loop continues only after tool results are fed back.
The ServerGeminiStreamEvent Union
The ServerGeminiStreamEvent is a discriminated union of 18 event types, defined by the GeminiEventType enum at line 52–71:
| Event Type | Purpose |
|---|---|
Content |
Streamed text from the model |
Thought |
Model's thinking/reasoning text |
ToolCallRequest |
Model requests a tool execution |
ToolCallResponse |
Result from executed tool |
ToolCallConfirmation |
Confirmation details for user approval |
ChatCompressed |
Context was compressed to fit window |
Finished |
Turn completed with reason and usage metadata |
Retry |
Stream retry in progress, discard partial content |
Error |
An error occurred |
UserCancelled |
User aborted the operation |
LoopDetected |
Infinite loop detected |
MaxSessionTurns |
Session turn limit reached |
Citation |
Citation or grounding metadata from the model |
ContextWindowWillOverflow |
Not enough tokens remaining |
InvalidStream |
Stream response was invalid |
ModelInfo |
Reports which model is being used |
AgentExecutionStopped |
Hook stopped execution |
AgentExecutionBlocked |
Hook blocked execution |
This union is the contract between the backend and any frontend. The CLI's React components, the non-interactive handler, and the SDK all consume this same event stream, pattern-matching on event.type to drive their respective behaviors.
GeminiChat: Low-Level Session Management
GeminiChat is the lowest layer — a wrapper around the @google/genai SDK that maintains conversation history and handles mid-stream retries. The file itself is a modified fork of the upstream chats.ts, created to work around a bug where function responses weren't treated as valid responses.
The key innovation is mid-stream retry logic, configured at line 89–93:
const MID_STREAM_RETRY_OPTIONS: MidStreamRetryOptions = {
maxAttempts: 4, // 1 initial call + 3 retries mid-stream
initialDelayMs: 1000,
useExponentialBackoff: true,
};
When a stream fails mid-response (network disconnect, invalid content), GeminiChat yields a RETRY event, discards partial results, and retries with exponential backoff. It also supports model fallback — if the primary model fails repeatedly, it can fall back to an alternative model via the handleFallback utility.
sequenceDiagram
participant GChat as GeminiChat
participant CG as ContentGenerator
participant API as Gemini API
GChat->>CG: generateContentStream(request)
CG->>API: HTTP stream
API-->>CG: chunk 1
CG-->>GChat: yield chunk 1
API--x CG: connection error
GChat->>GChat: yield RETRY event
GChat->>GChat: backoff (1000ms)
GChat->>CG: generateContentStream(request) [attempt 2]
CG->>API: HTTP stream
API-->>CG: full response
CG-->>GChat: yield chunks
The ContentGenerator interface abstracts the actual API calls. Implementations include the standard GoogleGenAI-backed generator, a LoggingContentGenerator for debug tracing, a RecordingContentGenerator for session replay, and a FakeContentGenerator for testing.
Chat Compression and Loop Detection
Two safety mechanisms prevent the agentic loop from running off the rails.
Chat Compression
When conversation history approaches the model's context window limit, GeminiClient.processTurn() triggers compression. The ChatCompressionService summarizes earlier turns to free up token budget. The CompressionStatus enum at line 167–185 tracks outcomes:
COMPRESSED— summary successfully replaced older historyCOMPRESSION_FAILED_INFLATED_TOKEN_COUNT— the summary was actually longerCONTENT_TRUNCATED— previous compression failed, so content was truncated to budget
flowchart TD
A[processTurn starts] --> B{Context management enabled?}
B -- Yes --> C[AgentHistoryProvider.manageHistory]
B -- No --> D[tryCompressChat]
D --> E{Compression succeeded?}
E -- Yes --> F[Yield ChatCompressed event]
E -- No --> G[Track failure for future truncation]
C --> H[Check remaining token count]
F --> H
G --> H
H --> I{Request fits in window?}
I -- Yes --> J[Continue with model call]
I -- No --> K[Yield ContextWindowWillOverflow]
A notable detail: if compression fails once, the client sets hasFailedCompressionAttempt = true and falls back to content truncation on subsequent attempts rather than trying (and failing) to compress again.
Loop Detection
The LoopDetectionService monitors event patterns across turns. At line 688, before each turn starts, it checks for repeating patterns. If a loop count of 1 is detected (early warning), the client attempts recovery via _recoverFromLoop(). If count exceeds 1, it yields a LoopDetected event and aborts.
Additionally, within a turn (line 757), each streaming event is fed to the loop detector for real-time monitoring. This catches loops that emerge mid-turn, such as the model repeatedly requesting the same tool call.
Hook Integration Points
As we saw in sendMessageStream(), hooks intercept the agentic loop at two key points:
BeforeAgent — Fired before the first model call in a prompt sequence. Can:
- Stop execution entirely (yield
AgentExecutionStopped) - Block execution with a reason (yield
AgentExecutionBlocked) - Inject additional context into the request
AfterAgent — Fired after the model responds with no pending tool calls. Can:
- Stop execution and optionally clear context
- Block and trigger a continuation turn with a new prompt
- Let execution proceed normally
The hook state is tracked per prompt_id in a Map to handle the recursive nature of sendMessageStream. The activeCalls counter ensures BeforeAgent fires only once per prompt, even if the method is called recursively for continuation turns.
Two additional hook points — BeforeModel and AfterModel — operate at the GeminiChat level, intercepting individual API calls. BeforeModel can modify request config or return a synthetic response. AfterModel can transform the response or trigger additional actions. We'll explore the full hook system in Article 5.
Tip: The
hookStateMapin GeminiClient (line 143) is key to understanding hook deduplication. If you're debugging why a hook fires only once despite multiple turns, checkhasFiredBeforeAgentandactiveCalls.
In the next article, we'll follow what happens when the model requests a tool call — exploring the tool system's builder pattern and the Scheduler's event-driven orchestration that validates, confirms, and executes tool invocations.