Read OSS

Extending Gemini CLI: Hooks, Skills, MCP, and the Extension System

Advanced

Prerequisites

  • Articles 1-4 of this series
  • Understanding of pub/sub and strategy design patterns
  • Familiarity with MCP (Model Context Protocol) concepts

Extending Gemini CLI: Hooks, Skills, MCP, and the Extension System

Gemini CLI is designed to be customized. From shell-command hooks that intercept agent turns to MCP servers that add entirely new tool capabilities, the codebase provides multiple extensibility surfaces. This article maps all of them — the five-component hook system, the four-level skill precedence, MCP integration with OAuth, the extension packaging system with HMAC integrity verification, model routing as a strategy pattern, and the sub-agent system.

The Hook System Architecture

The hook system at packages/core/src/hooks/ consists of five components that coordinate to execute user-defined shell commands at key lifecycle points.

flowchart TD
    EVENT[Lifecycle Event] --> EH[HookEventHandler<br/>dispatches events]
    EH --> HP[HookPlanner<br/>determines which hooks fire]
    HP --> HR[HookRunner<br/>executes as shell commands]
    HR --> HA[HookAggregator<br/>combines results]
    HA --> RESULT[Aggregated Result]
    
    REG[HookRegistry<br/>stores hook configs] --> HP

The HookSystem class wires them together:

constructor(config: Config) {
    this.hookRegistry = new HookRegistry(config);
    this.hookRunner = new HookRunner(config);
    this.hookAggregator = new HookAggregator();
    this.hookPlanner = new HookPlanner(this.hookRegistry);
    this.hookEventHandler = new HookEventHandler(
        config, this.hookPlanner, this.hookRunner, this.hookAggregator,
    );
}

HookRegistry stores hook configurations from multiple sources (user settings, workspace config, extensions). Each hook specifies which events it listens to, an optional matcher pattern, and whether it runs sequentially.

HookPlanner determines which registered hooks should fire for a given event. It consults the registry and evaluates matcher expressions.

HookRunner executes hooks as shell commands — hooks aren't JavaScript functions, they're external processes. This provides language-agnostic extensibility and a security boundary. Hook scripts receive context via environment variables and stdin.

HookAggregator combines results from multiple hooks that fire for the same event. Results can include system messages, additional context to inject, and control decisions (stop, block, continue).

HookEventHandler is the dispatcher that ties it all together, providing typed methods like fireSessionStartEvent, fireBeforeAgentEvent, fireAfterAgentEvent, etc.

The eight lifecycle events are:

Event When it fires
SessionStart When a session begins or resumes
SessionEnd When a session ends
BeforeAgent Before the first model call for a prompt
AfterAgent After the model responds with no pending tools
BeforeModel Before each individual API call
AfterModel After each API response
BeforeToolSelection Before tools are sent to the model
PreCompress Before chat history is compressed

As we covered in Article 2, BeforeAgent and AfterAgent can stop execution, block with reasons, or inject context. BeforeModel can modify the request config or return synthetic responses. BeforeToolSelection can modify which tools the model sees.

Tip: Hooks execute as shell commands, so they work in any language. Write a Python script, a Go binary, or even a bash one-liner. The hook receives JSON context on stdin and returns JSON results on stdout.

Skills: Discovery and Precedence

Skills are specialized prompt/tool configurations that users can activate during a session. The SkillManager discovers and loads skills from four locations with increasing precedence:

flowchart BT
    B[1. Built-in skills<br/>Lowest precedence] --> E[2. Extension skills]
    E --> U[3. User skills<br/>~/.gemini/skills/]
    U --> W[4. Workspace skills<br/>.gemini/skills/<br/>Highest precedence]

The discoverSkills() method at line 47 follows this order:

  1. Built-in skills from the skills/builtin/ directory (marked with isBuiltin = true)
  2. Extension skills from active extensions (extension.skills)
  3. User skills from ~/.gemini/skills/ and ~/.gemini/.agents/skills/
  4. Workspace skills from .gemini/skills/ and .gemini/.agents/skills/ (only if the folder is trusted)

When skills share the same name, higher-precedence locations override lower ones through addSkillsWithPrecedence(). This means a workspace skill can shadow a built-in skill, allowing project-specific customization.

A notable security consideration: workspace skills only load if the folder is trusted. Untrusted projects cannot inject skills that might alter agent behavior.

MCP Server Integration

MCP (Model Context Protocol) integration enables Gemini CLI to connect to external tool servers over stdio or HTTP. The integration spans OAuth authentication, server discovery, and dynamic tool registration.

The MCPOAuthProvider handles the OAuth/PKCE flow for MCP servers that require authentication. It manages authorization server metadata discovery, token acquisition, refresh, and storage.

sequenceDiagram
    participant Config as Config.initialize()
    participant MCM as McpClientManager
    participant OAuth as MCPOAuthProvider
    participant MCP as MCP Server
    participant TR as ToolRegistry
    participant PE as PolicyEngine
    
    Config->>MCM: Connect to configured servers
    MCM->>OAuth: Authenticate (if needed)
    OAuth->>MCP: PKCE auth flow
    MCP-->>OAuth: Access token
    MCM->>MCP: listTools()
    MCP-->>MCM: Tool schemas
    MCM->>TR: registerTool(DiscoveredMCPTool)
    Note over TR: mcp_serverName_toolName
    
    Note over PE: Policy rules with<br/>mcp_serverName_* wildcards<br/>control access

MCP tools are wrapped by DiscoveredMCPTool, which extends BaseDeclarativeTool and adds MCP-specific metadata (server name, tool annotations) used for policy matching. The mcp_ prefix convention (defined by MCP_TOOL_PREFIX) ensures no naming collisions with built-in tools and enables wildcard policy rules like mcp_myserver_*.

The policy engine's wildcard matching (which we covered in Article 4) recognizes three MCP patterns:

  • mcp_* — matches any MCP tool from any server
  • mcp_serverName_* — matches all tools from a specific server
  • mcp_serverName_toolName — matches a specific tool

The Extension System and Integrity Verification

Extensions bundle multiple extensibility features into installable packages. An extension can provide hooks, skills, MCP server configurations, slash commands, themes, and policy rules.

A critical security feature is the HMAC integrity verification system. The IntegrityKeyManager manages a 256-bit secret key used to sign extension metadata:

class IntegrityKeyManager {
    private readonly fallbackKeyPath: string;
    private readonly keychainService: KeychainService;
    private cachedSecretKey: string | null = null;
    
    async getSecretKey(): Promise<string> {
        if (this.cachedSecretKey) return this.cachedSecretKey;
        
        if (await this.keychainService.isAvailable()) {
            try {
                this.cachedSecretKey = await this.getSecretKeyFromKeychain();
                return this.cachedSecretKey;
            } catch (e) {
                // Fall back to file-based storage
            }
        }
        
        this.cachedSecretKey = await this.getSecretKeyFromFile();
        return this.cachedSecretKey;
    }
}

The key is stored preferentially in the OS keychain (via KeychainService) and falls back to a file with 0o600 permissions. When an extension is installed, its metadata is signed with this key. On load, the signature is verified to detect tampering.

graph TD
    subgraph "Extension Package"
        H[Hooks]
        S[Skills]
        MCP[MCP Servers]
        CMD[Commands]
        TH[Themes]
        POL[Policies]
    end
    
    INST[Extension Install] --> SIGN[HMAC Sign with secret key]
    SIGN --> STORE[Store signed metadata]
    
    LOAD[Extension Load] --> VERIFY[Verify HMAC signature]
    VERIFY --> ACTIVE[Activate extension]
    VERIFY --> REJECT[Reject tampered extension]
    
    subgraph "Key Storage"
        KC[OS Keychain<br/>preferred]
        FILE[File ~/.gemini/<br/>fallback, 0600]
    end

Tip: If extension verification fails mysteriously after a system migration, check whether your keychain was preserved. The fallback key file at ~/.gemini/ may have a different key than what was used to sign the extension on the original system.

Model Routing as an Extension Point

Model routing — deciding which model handles each turn — follows a Composite Strategy pattern. The ModelRouterService chains seven strategies in priority order:

flowchart LR
    REQ[Routing Request] --> F[FallbackStrategy]
    F --> O[OverrideStrategy]
    O --> A[ApprovalModeStrategy]
    A --> GC[GemmaClassifierStrategy]
    GC --> C[ClassifierStrategy]
    C --> N[NumericalClassifierStrategy]
    N --> D[DefaultStrategy<br/>terminal]

Each strategy receives a RoutingContext (conversation history, current request, requested model) and either returns a RoutingDecision or defers to the next strategy:

  • FallbackStrategy — Activates when the primary model is unavailable
  • OverrideStrategy — Handles explicit model overrides (e.g., /model flash)
  • ApprovalModeStrategy — Selects appropriate models for PLAN mode
  • GemmaClassifierStrategy — Uses a local Gemma model for routing classification
  • ClassifierStrategy — Generic LLM-based classification
  • NumericalClassifierStrategy — Uses numerical scoring heuristics
  • DefaultStrategy — Terminal strategy, always returns the configured model

The CompositeStrategy wraps all strategies and guarantees the terminal strategy always produces a result. This is the classic Chain of Responsibility pattern — each strategy either handles the request or passes it along.

Sub-Agents and Derived MessageBus

The sub-agent system enables Gemini CLI to spawn child agents for specialized tasks (like browser automation). Sub-agents need their own tool registries and message buses to avoid interference with the parent agent.

As we first saw in Article 1, MessageBus.derive() at line 46–72 creates a scoped child bus:

derive(subagentName: string): MessageBus {
    const bus = new MessageBus(this.policyEngine, this.debug);
    bus.publish = async (message: Message) => {
        if (message.type === MessageBusType.TOOL_CONFIRMATION_REQUEST) {
            return this.publish({
                ...message,
                subagent: message.subagent
                    ? `${subagentName}/${message.subagent}`
                    : subagentName,
            });
        }
        return this.publish(message);
    };
    // Delegate subscriptions to parent
    bus.subscribe = this.subscribe.bind(this);
    // ...
}

The derived bus overrides publish to prefix confirmation requests with the sub-agent name. All other operations (subscribe, unsubscribe, on, off) are delegated to the parent bus. This means a sub-agent's tool confirmations appear in the parent's UI with proper attribution (e.g., "browser/navigate"), while regular event handling flows through normally.

The sub-agent system also interacts with the AgentLoopContext. Each sub-agent receives a derived context with its own tool registry and message bus, while sharing the same Config and sandbox manager. This is where the AgentLoopContext interface (from Article 1) pays dividends — components that accept the interface work identically for both parent and sub-agent contexts.

In the final article, we'll explore the two primary interaction surfaces built on top of all these systems — the React/Ink terminal UI and the programmatic SDK.