Backends

kanoa supports multiple AI backends, each with different strengths and use cases.

Gemini (`gemini`)

For detailed documentation, see Gemini Backend Reference.

Best for: PDF knowledge bases, large context windows, cost optimization

Features

Native PDF Support: Upload PDFs directly, Gemini “sees” figures and tables
2M Token Context: Massive context window (Gemini 3 Pro) for large knowledge bases
Context Caching: Reuse cached content to reduce costs
Multimodal: Images, PDFs, text, and more

Authentication

# Option 1: API Key
export GOOGLE_API_KEY="your-api-key"

# Option 2: Application Default Credentials (ADC)
gcloud auth application-default login

Usage

from kanoa import AnalyticsInterpreter

# With API key
interpreter = AnalyticsInterpreter(
    backend='gemini',
    api_key='your-api-key'
)

# With ADC (Vertex AI)
interpreter = AnalyticsInterpreter(
    backend='gemini',
    project='your-project-id',
    location='us-central1'
)

Pricing

Token Type	Price (per 1M tokens)	Notes
Standard Input	$2.00	For context <200K tokens
Cached Input	$0.50	75% savings
Cache Storage	$0.20/hour	Per million cached tokens
Output	$12.00	All output tokens

Real-World Cost Study

Using an 8.5 MB PDF (WMO State of the Climate 2025 Report, ~9,500 tokens):

Operation	Cost	Notes
First query (cache creation)	$0.02	Full token cost
Subsequent queries (cached)	< $0.01	67% savings
10-query session	~$0.11	vs. $0.21 without caching

Bottom line: For a typical research session analyzing a scientific paper:

Free tier: Works for simple text analysis, but no caching or PDF support
Paid tier: ~$0.02 to cache a paper, then pennies per question

See the Context Caching Demo for a complete walkthrough.

Context Caching

Gemini supports explicit context caching for knowledge bases, providing significant cost savings when making multiple queries against the same content.

How It Works

First Query: kanoa uploads your KB and creates a cache (billed at standard rate)
Subsequent Queries: Cached content is reused (billed at $0.50/1M vs $2.00/1M)
Content Hashing: kanoa detects KB changes and refreshes the cache automatically

Enabling Context Caching

interpreter = AnalyticsInterpreter(
    backend='gemini',
    kb_path='./docs',
    cache_ttl=3600,  # Cache valid for 1 hour (default)
)

Usage Tracking

The UsageInfo object includes caching metrics:

result = interpreter.interpret(
    prompt="Analyze this data",
    stream=False
)

print(f"Cached tokens: {result.usage.cached_tokens}")
print(f"Cache savings: ${result.usage.cache_savings:.4f}")

Minimum Token Requirements

Context caching requires a minimum number of tokens to be beneficial:

Model	Minimum Tokens
gemini-2.5-flash	1,024
gemini-3-pro-preview	2,048
gemini-2.5-pro	4,096

Cache Management

You can manage caches programmatically or via the CLI:

# Clear cache manually (e.g., after updating KB files)
interpreter.clear_cache()

# Cache is also cleared automatically when KB content hash changes

For CLI usage:

python -m kanoa.tools.gemini_cache list

Best Practices

✅ Use for interactive sessions with multiple queries
✅ Set cache_ttl based on your session length
✅ Monitor cache_savings to track ROI
❌ Avoid for single-shot queries (cache creation overhead)
❌ Avoid for KBs < 2,048 tokens (no caching benefit)

Claude (`claude`)

For detailed documentation, see Claude Backend Reference.

Best for: Strong reasoning, text-heavy analysis (Claude Sonnet 4.5)

Features

Vision Support: Interprets images (but not PDFs directly)
Strong Reasoning: Excellent for complex analytical tasks
200K Context: Large context window for text knowledge bases

Authentication

export ANTHROPIC_API_KEY="your-api-key"

Usage

interpreter = AnalyticsInterpreter(
    backend='claude',
    api_key='your-api-key'
)

Pricing

Input: $3.00 per 1M tokens
Output: $15.00 per 1M tokens

vLLM (`vllm`)

For detailed documentation, see vLLM Backend Reference.

Best for: Local inference with open-source models (Molmo, Gemma 3)

Features

Fully Open Source: Run inference locally with no API costs
Privacy: Your data never leaves your machine
Vision Support: Supports multimodal models like Molmo
GPU Acceleration: Optimized for NVIDIA GPUs via vLLM

Usage

from kanoa import AnalyticsInterpreter

# Local vLLM (Molmo 7B)
interpreter = AnalyticsInterpreter(
    backend='vllm',
    api_base='http://localhost:8000/v1',
    model='allenai/Molmo-7B-D-0924'
)

# Local vLLM (Gemma 3 12B)
interpreter = AnalyticsInterpreter(
    backend='vllm',
    api_base='http://localhost:8000/v1',
    model='google/gemma-3-12b-it'
)

See the Local Inference Guide for setup instructions.

OpenAI (`openai`)

For detailed documentation, see OpenAI Backend Reference.

Best for: GPT models, Azure OpenAI

Features

GPT Models: Access to GPT-4, GPT-5.1, and future models
Azure OpenAI: Enterprise deployment via Azure
Vision Support: Supports image inputs with compatible models

Usage

OpenAI (GPT-5.1)

interpreter = AnalyticsInterpreter(
    backend='openai',
    api_key='sk-...'
)

Azure OpenAI

interpreter = AnalyticsInterpreter(
    backend='openai',
    api_base='https://your-resource.openai.azure.com/...',
    api_key='your-azure-key'
)

GitHub Copilot SDK (`github-copilot`)

Best for: GitHub Copilot users, GPT-5 models via CLI

Features

GitHub Copilot Integration: Use GitHub Copilot models programmatically
Streaming Support: Real-time streaming responses
CLI-based: Communicates with Copilot CLI via JSON-RPC
Multi-turn Conversations: Session-based chat history

Prerequisites

GitHub Copilot Subscription: Individual, Business, or Enterprise
Copilot CLI: Install from github.com/cli/cli

# Install GitHub CLI with Copilot
gh extension install github/gh-copilot

Authentication

GitHub Copilot SDK uses the Copilot CLI, which handles authentication through GitHub CLI:

# Authenticate with GitHub
gh auth login

# Verify Copilot access
copilot --version

Installation

pip install kanoa[github-copilot]

Usage

from kanoa import AnalyticsInterpreter

# Basic usage
interpreter = AnalyticsInterpreter(
    backend='github-copilot',
    model='gpt-5'  # Default model
)

# Stream responses
for chunk in interpreter.interpret(
    fig=my_plot,
    context="Analyzing sales trends",
    focus="Identify seasonal patterns"
):
    if chunk.type == "text":
        print(chunk.content, end="")

Configuration Options

interpreter = AnalyticsInterpreter(
    backend='github-copilot',
    model='gpt-5',
    cli_path='/custom/path/to/copilot',  # Optional: custom CLI path
    cli_url='localhost:8080',  # Optional: connect to external CLI server
    streaming=True,  # Enable streaming (default)
    verbose=1  # Logging level (0=silent, 1=info, 2=debug)
)

Limitations

Vision Support: Currently limited in Copilot SDK (text analysis preferred)
CLI Dependency: Requires Copilot CLI installed and in PATH
Subscription Required: Must have active GitHub Copilot subscription

Pricing

GitHub Copilot SDK uses your existing Copilot subscription. Token costs are managed through your GitHub subscription:

Individual: $10/month
Business: $19/user/month
Enterprise: Custom pricing

See GitHub Copilot pricing for details.

Enterprise Considerations

Current: Google AI Studio (`google-genai`)

kanoa currently uses the google-genai SDK, which connects to Google AI Studio. This is the recommended approach for most users:

✅ Simple API key authentication
✅ Application Default Credentials (ADC) support
✅ Low friction setup
✅ Full Gemini feature support (context caching, multimodal)

Future: Vertex AI Backend

For enterprise users requiring advanced compliance and security features, a dedicated Vertex AI backend is on the roadmap.

Feature	Google AI (`google-genai`)	Vertex AI (roadmap)
Auth	API key / ADC	Service account / ADC
VPC Service Controls	❌	✅
Audit Logs	❌	✅ (Cloud Logging)
CMEK (Customer-Managed Keys)	❌	✅
Private Endpoints	❌	✅
Model Registry	Limited	Full access
SLA	Consumer	Enterprise

Interested in enterprise features? Open an issue on GitHub to discuss your requirements and help prioritize the Vertex AI backend.

Backends

Gemini (gemini)

Features

Authentication

Usage

Pricing

Real-World Cost Study

Context Caching

How It Works

Enabling Context Caching

Usage Tracking

Minimum Token Requirements

Cache Management

Best Practices

Claude (claude)

Features

Authentication

Usage

Pricing

vLLM (vllm)

Features

Usage

OpenAI (openai)

Features

Usage

OpenAI (GPT-5.1)

Azure OpenAI

GitHub Copilot SDK (github-copilot)

Features

Prerequisites

Authentication

Installation

Usage

Configuration Options

Limitations

Pricing

Enterprise Considerations

Current: Google AI Studio (google-genai)

Future: Vertex AI Backend

Gemini (`gemini`)

Claude (`claude`)

vLLM (`vllm`)

OpenAI (`openai`)

GitHub Copilot SDK (`github-copilot`)

Current: Google AI Studio (`google-genai`)