# Context Caching with Gemini 3 Demo

## Step 1: Setup & Knowledge Base

We'll initialize the interpreter with **context caching enabled** and set up our knowledge base. While caching with Gemini is enabled *by default*, we'll explicitly demonstrate the API configuration here.

For our proprietary document proxy, we're using the **WMO State of the Global Climate 2025 Update** (presented at COP30). This report was chosen specifically because its publication date (November 4, 2025) falls *after* the knowledge cutoff for the Gemini 3 model family (January 2025), ensuring the model relies solely on the provided context.

For details on how `kanoa` handles file uploads to Google, please refer to the [Gemini Backend Documentation](../docs/source/backends/gemini.md).

In [1]:
from pathlib import Path

import kanoa
from kanoa import AnalyticsInterpreter

# 1. Configuration
# Set global verbosity (True = Info/Uploads, 2 = Debug/Payloads)
kanoa.options.verbose = True

# 2. Define Knowledge Base Resources
# We use the WMO State of the Climate 2025 Update PDF
KB_FILENAME = "State of the Climate 2025 Update COP30 (31 oct).pdf"
KB_DIR = Path("knowledge_base_demo")

# The URL contains spaces and parentheses, which kanoa handles automatically
KB_URL = f"https://wmo.int/sites/default/files/2025-11/{KB_FILENAME}"

# 3. Initialize Interpreter with Caching
# cache_ttl=3600 means the cache is valid for 1 hour
interpreter = AnalyticsInterpreter(
    backend="gemini",
    cache_ttl=3600,
)

# 4. Attach Knowledge Base & Add Resource
# We explicitly set the path to keep files in the example folder
interpreter = interpreter.with_kb(kb_path=KB_DIR, kb_type="pdf")

# This downloads the file if missing, or verifies it exists
# We don't need to pass filename; kanoa infers it from the URL
KB_PATH = interpreter.get_kb().add_resource(uri=KB_URL)

# 5. Trigger Load & Verify
# This will trigger the upload to Gemini (if not cached) and print status
interpreter.get_kb().get_context()

print("-" * 40)
print(f"Backend: {interpreter.backend_name}")
print(f"Cache TTL: {interpreter.backend.cache_ttl_seconds}s")
print(f"Knowledge Base: {KB_PATH}")
print(f"File Size: {KB_PATH.stat().st_size / (1024 * 1024):.2f} MB")


<div style="background: rgba(186, 164, 217, 0.12);
            border: 1px solid rgba(186, 164, 217, 0.35);
            border-left: 3px solid rgba(186, 164, 217, 0.75);
            padding: 14px 18px;
            margin: 10px 0;
            border-radius: 6px;
            font-size: 0.9em;
            line-height: 1.5;
            font-family: 'SF Mono', 'Monaco', 'Inconsolata', 'Fira Mono', 'Droid Sans Mono', 'Source Code Pro', monospace;
            box-sizing: border-box;
            max-width: 100%;
            overflow-x: auto;
            word-wrap: break-word;">

<div style="font-weight: 600; margin-bottom: 10px; font-size: 1.1em; opacity: 0.9;">kanoa</div>
<div style="opacity: 0.85;">Authenticating with Google Cloud (Model: gemini-3-pro-preview)...</div>
<div style="opacity: 0.85;">Downloading https://wmo.int/sites/default/files/2025-11/State%20of%20the%20Climate%202025%20Update%20COP30%20%2831%20oct%29.pdf to knowledge_base_demo/State of the Climate 2025 Update COP30 (31 oct).pdf...</div>
<div style="opacity: 0.85;">Found 1 PDFs to process...</div>
<div style="opacity: 0.85;">Processing PDF: State of the Climate 2025 Update COP30 (31 oct).pdf (8.59 MB)</div>
<div style="opacity: 0.85;">Using inline transfer (Vertex AI)...</div>
<div style="opacity: 0.85;">Cache Check: Checking context cache (Hash: ba7ca903782c0509)</div>
<div style="opacity: 0.85;">Checking server for existing cache: kanoa-kb-ba7ca903782c0509...</div>
<div style="opacity: 0.85;">Creating new cache on models/gemini-3-pro-preview...</div>
<div style="opacity: 0.85;">✓ Cache Created: Cache created: projects/830895911586/locations/global/cachedContents/5223241807499886592 (9,068 tokens)</div>
<div style="opacity: 0.85;">Generating content with gemini-3-pro-preview...</div>
<div style="opacity: 0.85;">Cache: Using cached context: projects/830895911586/locations/global/cachedContents/5223241807499886592</div>
<div style="opacity: 0.85;">Usage: 9,088 in / 309 out</div>
<div style="opacity: 0.85;">Cached tokens: 9,068</div>
<div style="opacity: 0.85;">Cache Check: Checking context cache (Hash: ba7ca903782c0509)</div>
<div style="opacity: 0.85;">Cache Hit: Cache hit (Memory)! Refreshing TTL for projects/830895911586/locations/global/cachedContents/5223241807499886592</div>
<div style="opacity: 0.85;">Generating content with gemini-3-pro-preview...</div>
<div style="opacity: 0.85;">Cache: Using cached context: projects/830895911586/locations/global/cachedContents/5223241807499886592</div>
<div style="opacity: 0.85;">Usage: 9,087 in / 187 out</div>
<div style="opacity: 0.85;">Cached tokens: 9,068</div>

</div>


----------------------------------------
Backend: gemini
Cache TTL: 3600s
Knowledge Base: knowledge_base_demo/State of the Climate 2025 Update COP30 (31 oct).pdf
File Size: 8.59 MB


## Step 2: Verify Token Count

Before running our query, let's verify the token count of our PDF knowledge base.
This helps us understand the scale of the context we're caching.

In [2]:
# Use the interpreter's built-in cost checker to verify token count and cost
# This validates against warning/approval thresholds
result = interpreter.check_kb_cost()

if result:
    print("Knowledge Base Token Check:")
    print(f"Status: {result.level.upper()}")
    print(f"Token Count: {result.token_count:,}")
    print(f"Estimated Cost: ${result.estimated_cost:.4f}")
    print(f"Message: {result.message}")
else:
    print("No files uploaded or backend does not support file uploads.")

Knowledge Base Token Check:
Status: WARN
Token Count: 9,520
Estimated Cost: $0.0190
Message: 9,520 tokens, ~$0.0190


### Checking Cache Status

You can check the status of the context cache for your current knowledge base without running a query. This is useful for verifying if a cache exists and inspecting its properties (TTL, token count).

In [3]:
import json

# Check cache status
status = interpreter.get_cache_status()

print("Current cache status:")
print(json.dumps(status, indent=2, default=str))

Current cache status:
{
  "exists": false,
  "hash": "ba7ca903782c0509",
  "reason": "Not found"
}


## Step 3: First Query (Cache Miss)

The first query will upload the knowledge base and create a cache.
You'll see the full token cost for the KB content.

In [4]:
# First query - this creates the cache
result1 = interpreter.interpret(
    custom_prompt="Summarize the key findings regarding global temperature anomalies in 2025 from the WMO report."
)

print("=" * 60)
print("FIRST QUERY RESULTS (Cache Creation)")
print("=" * 60)
print(f"\nResponse:\n{result1.text[:500]}...\n")

if result1.usage:
    print(f"Input tokens:  {result1.usage.input_tokens:,}")
    print(f"Output tokens: {result1.usage.output_tokens:,}")
    print(f"Cached tokens: {result1.usage.cached_tokens or 0:,}")
    print("Cache savings: $0.0000 (Cache Creation)")


<div style="background: rgba(2, 62, 138, 0.08);
            border: 1px solid rgba(2, 62, 138, 0.3);
            border-left: 4px solid rgba(2, 62, 138, 0.8);
            padding: 16px 20px;
            margin: 10px 0;
            border-radius: 8px;
            backdrop-filter: blur(5px);
            box-shadow: 0 2px 8px rgba(0, 0, 0, 0.08);">

<div style="font-weight: 600; margin-bottom: 12px; opacity: 0.9; font-size: 1.1em; font-family: 'SF Mono', 'Monaco', 'Inconsolata', 'Fira Mono', 'Droid Sans Mono', 'Source Code Pro', monospace;">gemini</div>

Based on the "State of the Climate 2025 Update for COP30" report, here is a summary of the key findings regarding global temperature anomalies for 2025:

*   **Temperature Anomaly:** From January to August 2025, the global mean near-surface temperature was **1.42°C (± 0.12°C)** above the pre-industrial average.
*   **Ranking:** The year 2025 is on track to be the **second or third warmest year on record**, ranking behind 2024.
*   **Comparison to 2024:** Temperatures have dropped slightly compared to the record highs of 2024. This cooling is consistent with a shift from El Niño conditions (which boosted temperatures in 2023 and 2024) to neutral conditions at the start of 2025.
*   **Long-term Trend:** The past 11 years (2015–2025) represent the 11 warmest years in the 176-year observational record. Specifically, the last three years (including 2025) are the three warmest on record.
*   **Contributing Factors:** The high global temperatures over the last three years are linked to the transition out of a prolonged La Niña event (2020–early 2023), alongside other factors such as reductions in aerosols.

---
<small>**gemini-3-pro-preview** · 9,088→309 tokens (9,068 cached) · $0.0219 · cache created</small>

</div>


FIRST QUERY RESULTS (Cache Creation)

Response:
Based on the "State of the Climate 2025 Update for COP30" report, here is a summary of the key findings regarding global temperature anomalies for 2025:

*   **Temperature Anomaly:** From January to August 2025, the global mean near-surface temperature was **1.42°C (± 0.12°C)** above the pre-industrial average.
*   **Ranking:** The year 2025 is on track to be the **second or third warmest year on record**, ranking behind 2024.
*   **Comparison to 2024:** Temperatures have dropped slightly compar...

Input tokens:  9,088
Output tokens: 309
Cached tokens: 9,068
Cache savings: $0.0000 (Cache Creation)


## Step 4: Second Query (Cache Hit)

The second query reuses the cached knowledge base.
Notice the **cached tokens** are now non-zero, and **cache savings** shows the cost reduction.

In [5]:
# Second query - this reuses the cache
result2 = interpreter.interpret(
    custom_prompt="What specific outcomes or decisions from COP30 in Belem are mentioned in relation to biodiversity and environmental protection?"
)

print("=" * 60)
print("SECOND QUERY RESULTS (Cache Hit)")
print("=" * 60)
print(f"\nResponse:\n{result2.text[:500]}...\n")

if result2.usage:
    print(f"Input tokens:  {result2.usage.input_tokens:,}")
    print(f"Output tokens: {result2.usage.output_tokens:,}")
    print(f"Cached tokens: {result2.usage.cached_tokens or 0:,}")
    print(f"Cache savings: ${result2.usage.cache_savings or 0.0:.4f}")


<div style="background: rgba(2, 62, 138, 0.08);
            border: 1px solid rgba(2, 62, 138, 0.3);
            border-left: 4px solid rgba(2, 62, 138, 0.8);
            padding: 16px 20px;
            margin: 10px 0;
            border-radius: 8px;
            backdrop-filter: blur(5px);
            box-shadow: 0 2px 8px rgba(0, 0, 0, 0.08);">

<div style="font-weight: 600; margin-bottom: 12px; opacity: 0.9; font-size: 1.1em; font-family: 'SF Mono', 'Monaco', 'Inconsolata', 'Fira Mono', 'Droid Sans Mono', 'Source Code Pro', monospace;">gemini</div>

Based on the provided document, **there are no specific outcomes or decisions from COP30 regarding biodiversity and environmental protection mentioned.**

The document is a scientific report titled **"State of the Climate: Update for COP30."** Its stated purpose (found on page 3) is to "inform discussions... with authoritative, up-to-date information on the state of the global climate."

Because this report serves as an input **to** the conference rather than a summary **of** the conference, it focuses on physical climate indicators (such as temperature, greenhouse gas concentrations, and sea-level rise) rather than political negotiations or financial agreements.

While the document mentions the **need** for investment in certain areas—specifically noting on page 15 that "continued investment in connectivity, data access and technical skills... is essential" for early warning systems—it does not record any decisions made at COP30 regarding funding.

---
<small>**gemini-3-pro-preview** · 9,087→187 tokens (9,068 cached) · $0.0068 · cached</small>

</div>


SECOND QUERY RESULTS (Cache Hit)

Response:
Based on the provided document, **there are no specific outcomes or decisions from COP30 regarding biodiversity and environmental protection mentioned.**

The document is a scientific report titled **"State of the Climate: Update for COP30."** Its stated purpose (found on page 3) is to "inform discussions... with authoritative, up-to-date information on the state of the global climate."

Because this report serves as an input **to** the conference rather than a summary **of** the conference, it focuses on physical climat...

Input tokens:  9,087
Output tokens: 187
Cached tokens: 9,068
Cache savings: $0.0136


## Understanding Cost Savings

Context caching provides significant cost savings for repeated queries.

For the most up-to-date pricing and a detailed breakdown of costs (including cache storage), please refer to the [Gemini Backend Pricing Documentation](../docs/source/backends/gemini.md#pricing).

**Savings formula**: `(cached_tokens / 1M) * (Standard_Price - Cached_Price) = savings`

For a 10,000 token knowledge base queried 10 times (using typical savings rates):
- Without caching: `10 × 10,000 × $2.00/1M = $0.20`
- With caching: `10,000 × $2.00/1M + 9 × 10,000 × $0.50/1M = $0.065`
- **~67% savings!**

## Summary

kanoa's context caching feature:

1. **Automatically** caches your knowledge base content
2. **Reuses** the cache for subsequent queries (same content hash)
3. **Saves ~75%** on input token costs for cached content
4. **Tracks** cached tokens and savings in the `UsageInfo` object

### When to Use Context Caching

- ✅ Interactive analysis sessions with multiple queries
- ✅ Batch processing against a stable knowledge base
- ✅ Knowledge bases > 2,048 tokens (minimum for caching benefit)

### When NOT to Use Context Caching

- ❌ Single-shot queries (cache creation overhead)
- ❌ Rapidly changing knowledge bases
- ❌ Very small knowledge bases (< 2,048 tokens)