← Back to Blog

Deconstructing the Gemini CLI: A Tale of Two Agents

AI Agents Gemini Routing

If you think the Gemini CLI is just a simple chatbot, look closer at the logs. It is actually a sophisticated Multi-Agent System that acts more like a corporate team than a single AI.

I explained how you can check any LLM req/res logs on your machine in a separate blog post, please check this post for more details.

I analyzed the raw request/response logs from a recent session, and here is exactly how it works—step by step, using the actual data traffic.

The "Traffic Cop" (The Router)

Every time you hit Enter, your message doesn't go straight to the main AI. First, it stops at a tiny, ultra-fast model called Gemini 2.5 Flash-Lite.

This model has one job: Management. It reads your prompt and decides how hard the task is using a strict "Complexity Rubric."


Iteration 1:

Command 1: The Greeting

When you send a simple "Hi", the router analyzes it.

Request 1 payload:

Endpoint: https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent
{
    "contents": [
        {
            "parts": [
                {
                    "text": "This is the Gemini CLI. We are setting up the context for our chat.
                    Today's date is Tuesday, January 20, 2026 ........"
                }
            ],
            "role": "user"
        },
        {
            "parts": [
                {
                    "text": "Hi"
                }
            ],
            "role": "user"
        }
    ],
    "systemInstruction": {
        "parts": [
            {
                "text": "
                    You are a specialized Task Routing AI. Your sole function is to analyze the user's request and classify its complexity. Choose between `flash` (SIMPLE) or `pro` (COMPLEX).
                    1.  `flash`: A fast, efficient model for simple, well-defined tasks.
                    2.  `pro`: A powerful, advanced model for complex, open-ended, or multi-step tasks.
                
                    
                    A task is COMPLEX (Choose `pro`) if it meets ONE OR MORE of the following criteria:
                        1.  **High Operational Complexity (Est. 4+ Steps/Tool Calls):** Requires dependent actions, significant planning, or multiple coordinated changes.
                        2.  **Strategic Planning & Conceptual Design:** Asking \"how\" or \"why.\" Requires advice, architecture, or high-level strategy.
                        3.  **High Ambiguity or Large Scope (Extensive Investigation):** Broadly defined requests requiring extensive investigation.
                        4.  **Deep Debugging & Root Cause Analysis:** Diagnosing unknown or complex problems from symptoms.

                    A task is SIMPLE (Choose `flash`) if it is highly specific, bounded, and has Low Operational Complexity (Est. 1-3 tool calls). Operational simplicity overrides strategic phrasing.
                    
                    ....
                "
            }
        ],
        "role": "user"
    },
    "generationConfig": {
        "temperature": 0,
        "topP": 1,
        "maxOutputTokens": 1024,
        "responseMimeType": "application/json",
        "thinkingConfig": {
            "thinkingBudget": 512
        }
    }
}

Request 1 response:

{
    "candidates": [
        {
            "content": {
                "parts": [
                    {
                        "text": "{
                          'reasoning': 'The user\'s input is a simple greeting, requiring no complex operations or analysis. It falls under low operational complexity.',
                          'model_choice': 'flash'. <------ Look at this line, The model is chosen
                        }"
                    }
                ],
                "role": "model"
            },
            "finishReason": "STOP",
            "index": 0
        }
    ],
    "usageMetadata": {
        "promptTokenCount": 1171,
        "candidatesTokenCount": 44,
        "totalTokenCount": 1298,
        "cachedContentTokenCount": 809,
        "promptTokensDetails": [ ... ],
        "cacheTokensDetails": [ ... ],
        "thoughtsTokenCount": 83
    },
    "modelVersion": "gemini-2.5-flash-lite",
    "responseId": "_fZvaZqIGLTevdIPzsa2GQ"
}

Insight: The router sees "Hi", checks its rubric, and selects "flash". This saves the powerful model for harder tasks.

What is next?

Now Gemini-CLI knows it should use gemini-3-flash-preview to process the user's command.

Request 2 payload:

Endpoint: https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:streamGenerateContent?alt=sse
{
    "contents": [ ... ],
    "systemInstruction": {
        "parts": [
            {
                "text": "
                    You are an interactive CLI agent specializing in software engineering tasks. Your primary goal is to help users safely and efficiently, adhering strictly to the following instructions and utilizing your available tools.
                    
                    # Core Mandates
                    ...
                    ## Available Sub-Agents
                    Use `delegate_to_agent` for complex tasks requiring specialized analysis.
                "
            }
        ],
        "role": "user"
    },
    "tools": [ ... ],
    "generationConfig": {
        "temperature": 1,
        "topP": 0.95,
        "topK": 64,
        "thinkingConfig": {
            "includeThoughts": true,
            "thinkingLevel": "HIGH"
        }
    }
}

Request 2 response (SSE Payloads):

data: {"candidates": [{"content": {"parts": [{"text": "**Establishing Initial Grounding** I'm now grounded in this context. I remember my name is Saeed, and ...","thought": true}],"role": "model"},"index": 0}],"usageMetadata": {"promptTokenCount": 12318,"totalTokenCount": 12318},"modelVersion": "gemini-3-flash-preview"}

data: {"candidates": [{"content": {"parts": [{"text": "Hello Saeed. How can I help"}],"role": "model"},"index": 0}],"usageMetadata": {"promptTokenCount": 12318,"candidatesTokenCount": 7,"totalTokenCount": 12385},"modelVersion": "gemini-3-flash-preview"}

Iteration 2:

Later, you can ask a complex command like: "Analyze the current directory and architect a testing strategy..."
Note: I don't put the whole logs here for more readability, the req/res structure is same as Iteration 1.

Request 3 payload:

Endpoint: https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent
{
    "contents": [
        { "parts": [{ "text": "Hi" }], "role": "user" },
        { "parts": [{ "text": "Hello Saeed. How can I help you today?" }], "role": "model" },
        { "parts": [{ "text": "Analyze the current directory and architect a testing strategy for this project." }], "role": "user" }
    ],
    "systemInstruction": { ... }
}

Request 3 response:

{
    "candidates": [
        {
            "content": {
                "parts": [
                    {
                        "text": "{
                          'reasoning': 'The user is asking about architect. This falls under Strategic Planning & Conceptual ...',
                          'model_choice': 'pro' <------ Look at this line, The model is chosen
                        }"
                    }
                ],
                "role": "model"
            }
        }
    ]
}

Insight: It spotted keywords like "Architect" and "Strategy." It immediately flagged this as "pro", authorizing the use of the more expensive, smarter model.

What is next?

Now Gemini-CLI knows it should use /gemini-3-pro-preview to process user's command.

Request 4 payload:

Endpoint: https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:streamGenerateContent?alt=sse

Request 4 response:

data: {"candidates": [{"content": {"parts": [{"text": "The current directory `/Users/saeed/Desktop` appears to be your desktop folder and not"}],"role": "model"},"index": 0}],"usageMetadata": {"promptTokenCount": 26840,"candidatesTokenCount": 19,"totalTokenCount": 27079},"modelVersion": "gemini-3-pro-preview"}

Key Findings: Why This Matters

  1. Hallucination Prevention:
    • A standard AI might have invented a fake testing strategy for a React app you don't have.
    • Because the Pro model used Tools (list_directory) to "see" the empty directory first, it was grounded in reality and gave an honest refusal.
  2. Massive Token Savings:
    • Router Cost: The "Manager" prompt was small (~1,000 tokens).
    • Pro Cost: The "Pro" session context was huge (~27,000 tokens).
    • Result: By filtering simple requests away from Pro, Google saves massive amounts of compute and money.

The "Cheat Code" (Manual Override)

Sometimes, the manager gets it wrong, or you just want the smartest model regardless of the cost. You can force the CLI to skip the router and lock onto the Pro model by typing:

/model gemini-3-pro-preview

This bypasses the logic checks and gives you raw power for every interaction.