Backends
kanoa supports multiple AI backends, each with different strengths and use cases.
Gemini (gemini)
For detailed documentation, see Gemini Backend Reference.
Best for: PDF knowledge bases, large context windows, cost optimization
Features
Native PDF Support: Upload PDFs directly, Gemini “sees” figures and tables
2M Token Context: Massive context window (Gemini 3 Pro) for large knowledge bases
Context Caching: Reuse cached content to reduce costs
Multimodal: Images, PDFs, text, and more
Authentication
# Option 1: API Key
export GOOGLE_API_KEY="your-api-key"
# Option 2: Application Default Credentials (ADC)
gcloud auth application-default login
Usage
from kanoa import AnalyticsInterpreter
# With API key
interpreter = AnalyticsInterpreter(
backend='gemini',
api_key='your-api-key'
)
# With ADC (Vertex AI)
interpreter = AnalyticsInterpreter(
backend='gemini',
project='your-project-id',
location='us-central1'
)
Pricing
Token Type |
Price (per 1M tokens) |
Notes |
|---|---|---|
Standard Input |
$2.00 |
For context <200K tokens |
Cached Input |
$0.50 |
75% savings |
Cache Storage |
$0.20/hour |
Per million cached tokens |
Output |
$12.00 |
All output tokens |
Real-World Cost Study
Using an 8.5 MB PDF (WMO State of the Climate 2025 Report, ~9,500 tokens):
Operation |
Cost |
Notes |
|---|---|---|
First query (cache creation) |
$0.02 |
Full token cost |
Subsequent queries (cached) |
< $0.01 |
67% savings |
10-query session |
~$0.11 |
vs. $0.21 without caching |
Bottom line: For a typical research session analyzing a scientific paper:
Free tier: Works for simple text analysis, but no caching or PDF support
Paid tier: ~$0.02 to cache a paper, then pennies per question
See the Context Caching Demo for a complete walkthrough.
Context Caching
Gemini supports explicit context caching for knowledge bases, providing significant cost savings when making multiple queries against the same content.
How It Works
First Query: kanoa uploads your KB and creates a cache (billed at standard rate)
Subsequent Queries: Cached content is reused (billed at $0.50/1M vs $2.00/1M)
Content Hashing: kanoa detects KB changes and refreshes the cache automatically
Enabling Context Caching
interpreter = AnalyticsInterpreter(
backend='gemini',
kb_path='./docs',
cache_ttl=3600, # Cache valid for 1 hour (default)
)
Usage Tracking
The UsageInfo object includes caching metrics:
result = interpreter.interpret(
prompt="Analyze this data",
stream=False
)
print(f"Cached tokens: {result.usage.cached_tokens}")
print(f"Cache savings: ${result.usage.cache_savings:.4f}")
Minimum Token Requirements
Context caching requires a minimum number of tokens to be beneficial:
Model |
Minimum Tokens |
|---|---|
gemini-2.5-flash |
1,024 |
gemini-3-pro-preview |
2,048 |
gemini-2.5-pro |
4,096 |
Cache Management
You can manage caches programmatically or via the CLI:
# Clear cache manually (e.g., after updating KB files)
interpreter.clear_cache()
# Cache is also cleared automatically when KB content hash changes
For CLI usage:
python -m kanoa.tools.gemini_cache list
Best Practices
✅ Use for interactive sessions with multiple queries
✅ Set
cache_ttlbased on your session length✅ Monitor
cache_savingsto track ROI❌ Avoid for single-shot queries (cache creation overhead)
❌ Avoid for KBs < 2,048 tokens (no caching benefit)
Claude (claude)
For detailed documentation, see Claude Backend Reference.
Best for: Strong reasoning, text-heavy analysis (Claude Sonnet 4.5)
Features
Vision Support: Interprets images (but not PDFs directly)
Strong Reasoning: Excellent for complex analytical tasks
200K Context: Large context window for text knowledge bases
Authentication
export ANTHROPIC_API_KEY="your-api-key"
Usage
interpreter = AnalyticsInterpreter(
backend='claude',
api_key='your-api-key'
)
Pricing
Input: $3.00 per 1M tokens
Output: $15.00 per 1M tokens
vLLM (vllm)
For detailed documentation, see vLLM Backend Reference.
Best for: Local inference with open-source models (Molmo, Gemma 3)
Features
Fully Open Source: Run inference locally with no API costs
Privacy: Your data never leaves your machine
Vision Support: Supports multimodal models like Molmo
GPU Acceleration: Optimized for NVIDIA GPUs via vLLM
Usage
from kanoa import AnalyticsInterpreter
# Local vLLM (Molmo 7B)
interpreter = AnalyticsInterpreter(
backend='vllm',
api_base='http://localhost:8000/v1',
model='allenai/Molmo-7B-D-0924'
)
# Local vLLM (Gemma 3 12B)
interpreter = AnalyticsInterpreter(
backend='vllm',
api_base='http://localhost:8000/v1',
model='google/gemma-3-12b-it'
)
See the Local Inference Guide for setup instructions.
OpenAI (openai)
For detailed documentation, see OpenAI Backend Reference.
Best for: GPT models, Azure OpenAI
Features
GPT Models: Access to GPT-4, GPT-5.1, and future models
Azure OpenAI: Enterprise deployment via Azure
Vision Support: Supports image inputs with compatible models
Usage
OpenAI (GPT-5.1)
interpreter = AnalyticsInterpreter(
backend='openai',
api_key='sk-...'
)
Azure OpenAI
interpreter = AnalyticsInterpreter(
backend='openai',
api_base='https://your-resource.openai.azure.com/...',
api_key='your-azure-key'
)
GitHub Copilot SDK (github-copilot)
Best for: GitHub Copilot users, GPT-5 models via CLI
Features
GitHub Copilot Integration: Use GitHub Copilot models programmatically
Streaming Support: Real-time streaming responses
CLI-based: Communicates with Copilot CLI via JSON-RPC
Multi-turn Conversations: Session-based chat history
Prerequisites
GitHub Copilot Subscription: Individual, Business, or Enterprise
Copilot CLI: Install from github.com/cli/cli
# Install GitHub CLI with Copilot
gh extension install github/gh-copilot
Authentication
GitHub Copilot SDK uses the Copilot CLI, which handles authentication through GitHub CLI:
# Authenticate with GitHub
gh auth login
# Verify Copilot access
copilot --version
Installation
pip install kanoa[github-copilot]
Usage
from kanoa import AnalyticsInterpreter
# Basic usage
interpreter = AnalyticsInterpreter(
backend='github-copilot',
model='gpt-5' # Default model
)
# Stream responses
for chunk in interpreter.interpret(
fig=my_plot,
context="Analyzing sales trends",
focus="Identify seasonal patterns"
):
if chunk.type == "text":
print(chunk.content, end="")
Configuration Options
interpreter = AnalyticsInterpreter(
backend='github-copilot',
model='gpt-5',
cli_path='/custom/path/to/copilot', # Optional: custom CLI path
cli_url='localhost:8080', # Optional: connect to external CLI server
streaming=True, # Enable streaming (default)
verbose=1 # Logging level (0=silent, 1=info, 2=debug)
)
Limitations
Vision Support: Currently limited in Copilot SDK (text analysis preferred)
CLI Dependency: Requires Copilot CLI installed and in PATH
Subscription Required: Must have active GitHub Copilot subscription
Pricing
GitHub Copilot SDK uses your existing Copilot subscription. Token costs are managed through your GitHub subscription:
Individual: $10/month
Business: $19/user/month
Enterprise: Custom pricing
See GitHub Copilot pricing for details.
Enterprise Considerations
Current: Google AI Studio (google-genai)
kanoa currently uses the google-genai SDK, which connects to Google AI Studio.
This is the recommended approach for most users:
✅ Simple API key authentication
✅ Application Default Credentials (ADC) support
✅ Low friction setup
✅ Full Gemini feature support (context caching, multimodal)
Future: Vertex AI Backend
For enterprise users requiring advanced compliance and security features, a dedicated Vertex AI backend is on the roadmap.
Feature |
Google AI ( |
Vertex AI (roadmap) |
|---|---|---|
Auth |
API key / ADC |
Service account / ADC |
VPC Service Controls |
❌ |
✅ |
Audit Logs |
❌ |
✅ (Cloud Logging) |
CMEK (Customer-Managed Keys) |
❌ |
✅ |
Private Endpoints |
❌ |
✅ |
Model Registry |
Limited |
Full access |
SLA |
Consumer |
Enterprise |
Interested in enterprise features? Open an issue on GitHub to discuss your requirements and help prioritize the Vertex AI backend.