Backends
kanoa supports multiple AI backends, each with different strengths and use cases.
Gemini (gemini)
For detailed documentation, see Gemini Backend Reference.
Best for: PDF knowledge bases, large context windows, cost optimization
Features
Native PDF Support: Upload PDFs directly, Gemini “sees” figures and tables
2M Token Context: Massive context window (Gemini 3 Pro) for large knowledge bases
Context Caching: Reuse cached content to reduce costs
Multimodal: Images, PDFs, text, and more
Authentication
# Option 1: API Key
export GOOGLE_API_KEY="your-api-key"
# Option 2: Application Default Credentials (ADC)
gcloud auth application-default login
Usage
from kanoa import AnalyticsInterpreter
# With API key
interpreter = AnalyticsInterpreter(
backend='gemini',
api_key='your-api-key'
)
# With ADC (Vertex AI)
interpreter = AnalyticsInterpreter(
backend='gemini',
project='your-project-id',
location='us-central1'
)
Pricing
Token Type |
Price (per 1M tokens) |
Notes |
|---|---|---|
Standard Input |
$2.00 |
For context <200K tokens |
Cached Input |
$0.50 |
75% savings |
Cache Storage |
$0.20/hour |
Per million cached tokens |
Output |
$12.00 |
All output tokens |
Real-World Cost Study
Using an 8.5 MB PDF (WMO State of the Climate 2025 Report, ~9,500 tokens):
Operation |
Cost |
Notes |
|---|---|---|
First query (cache creation) |
$0.02 |
Full token cost |
Subsequent queries (cached) |
< $0.01 |
67% savings |
10-query session |
~$0.11 |
vs. $0.21 without caching |
Bottom line: For a typical research session analyzing a scientific paper:
Free tier: Works for simple text analysis, but no caching or PDF support
Paid tier: ~$0.02 to cache a paper, then pennies per question
See the Context Caching Demo for a complete walkthrough.
Context Caching
Gemini supports explicit context caching for knowledge bases, providing significant cost savings when making multiple queries against the same content.
How It Works
First Query: kanoa uploads your KB and creates a cache (billed at standard rate)
Subsequent Queries: Cached content is reused (billed at $0.50/1M vs $2.00/1M)
Content Hashing: kanoa detects KB changes and refreshes the cache automatically
Enabling Context Caching
interpreter = AnalyticsInterpreter(
backend='gemini',
kb_path='./docs',
cache_ttl=3600, # Cache valid for 1 hour (default)
)
Usage Tracking
The UsageInfo object includes caching metrics:
result = interpreter.interpret(prompt="Analyze this data")
print(f"Cached tokens: {result.usage.cached_tokens}")
print(f"Cache savings: ${result.usage.cache_savings:.4f}")
Minimum Token Requirements
Context caching requires a minimum number of tokens to be beneficial:
Model |
Minimum Tokens |
|---|---|
gemini-2.5-flash |
1,024 |
gemini-3-pro-preview |
2,048 |
gemini-2.5-pro |
4,096 |
Cache Management
You can manage caches programmatically or via the CLI:
# Clear cache manually (e.g., after updating KB files)
interpreter.clear_cache()
# Cache is also cleared automatically when KB content hash changes
For CLI usage:
python -m kanoa.tools.gemini_cache list
Best Practices
✅ Use for interactive sessions with multiple queries
✅ Set
cache_ttlbased on your session length✅ Monitor
cache_savingsto track ROI❌ Avoid for single-shot queries (cache creation overhead)
❌ Avoid for KBs < 2,048 tokens (no caching benefit)
Claude (claude)
For detailed documentation, see Claude Backend Reference.
Best for: Strong reasoning, text-heavy analysis (Claude Sonnet 4.5)
Features
Vision Support: Interprets images (but not PDFs directly)
Strong Reasoning: Excellent for complex analytical tasks
200K Context: Large context window for text knowledge bases
Authentication
export ANTHROPIC_API_KEY="your-api-key"
Usage
interpreter = AnalyticsInterpreter(
backend='claude',
api_key='your-api-key'
)
Pricing
Input: $3.00 per 1M tokens
Output: $15.00 per 1M tokens
vLLM (vllm)
For detailed documentation, see vLLM Backend Reference.
Best for: Local inference with open-source models (Molmo, Gemma 3)
Features
Fully Open Source: Run inference locally with no API costs
Privacy: Your data never leaves your machine
Vision Support: Supports multimodal models like Molmo
GPU Acceleration: Optimized for NVIDIA GPUs via vLLM
Usage
from kanoa import AnalyticsInterpreter
# Local vLLM (Molmo 7B)
interpreter = AnalyticsInterpreter(
backend='vllm',
api_base='http://localhost:8000/v1',
model='allenai/Molmo-7B-D-0924'
)
# Local vLLM (Gemma 3 12B)
interpreter = AnalyticsInterpreter(
backend='vllm',
api_base='http://localhost:8000/v1',
model='google/gemma-3-12b-it'
)
See the vLLM Getting Started Guide for setup instructions.
OpenAI (openai)
For detailed documentation, see OpenAI Backend Reference.
Best for: GPT models, Azure OpenAI
Features
GPT Models: Access to GPT-4, GPT-5.1, and future models
Azure OpenAI: Enterprise deployment via Azure
Vision Support: Supports image inputs with compatible models
Usage
OpenAI (GPT-5.1)
interpreter = AnalyticsInterpreter(
backend='openai',
api_key='sk-...'
)
Azure OpenAI
interpreter = AnalyticsInterpreter(
backend='openai',
api_base='https://your-resource.openai.azure.com/...',
api_key='your-azure-key'
)
Enterprise Considerations
Current: Google AI Studio (google-genai)
kanoa currently uses the google-genai SDK, which connects to Google AI Studio.
This is the recommended approach for most users:
✅ Simple API key authentication
✅ Application Default Credentials (ADC) support
✅ Low friction setup
✅ Full Gemini feature support (context caching, multimodal)
Future: Vertex AI Backend
For enterprise users requiring advanced compliance and security features, a dedicated Vertex AI backend is on the roadmap.
Feature |
Google AI ( |
Vertex AI (roadmap) |
|---|---|---|
Auth |
API key / ADC |
Service account / ADC |
VPC Service Controls |
❌ |
✅ |
Audit Logs |
❌ |
✅ (Cloud Logging) |
CMEK (Customer-Managed Keys) |
❌ |
✅ |
Private Endpoints |
❌ |
✅ |
Model Registry |
Limited |
Full access |
SLA |
Consumer |
Enterprise |
Interested in enterprise features? Open an issue on GitHub to discuss your requirements and help prioritize the Vertex AI backend.