# Backends

kanoa supports multiple AI backends, each with different strengths and use cases.

## Gemini (`gemini`)

> For detailed documentation, see [Gemini Backend Reference](../backends/gemini.md).

**Best for**: PDF knowledge bases, large context windows, cost optimization

### Features

- **Native PDF Support**: Upload PDFs directly, Gemini "sees" figures and tables
- **2M Token Context**: Massive context window (Gemini 3 Pro) for large knowledge bases
- **Context Caching**: Reuse cached content to reduce costs
- **Multimodal**: Images, PDFs, text, and more

### Authentication

```bash
# Option 1: API Key
export GOOGLE_API_KEY="your-api-key"

# Option 2: Application Default Credentials (ADC)
gcloud auth application-default login
```

### Usage

```python
from kanoa import AnalyticsInterpreter

# With API key
interpreter = AnalyticsInterpreter(
    backend='gemini',
    api_key='your-api-key'
)

# With ADC (Vertex AI)
interpreter = AnalyticsInterpreter(
    backend='gemini',
    project='your-project-id',
    location='us-central1'
)
```

### Pricing

| Token Type | Price (per 1M tokens) | Notes |
| --- | --- | --- |
| Standard Input | $2.00 | For context <200K tokens |
| Cached Input | $0.50 | 75% savings |
| Cache Storage | $0.20/hour | Per million cached tokens |
| Output | $12.00 | All output tokens |

#### Real-World Cost Study

Using an **8.5 MB PDF** (WMO State of the Climate 2025 Report, ~9,500 tokens):

| Operation | Cost | Notes |
| --- | --- | --- |
| First query (cache creation) | $0.02 | Full token cost |
| Subsequent queries (cached) | < $0.01 | 67% savings |
| 10-query session | ~$0.11 | vs. $0.21 without caching |

**Bottom line**: For a typical research session analyzing a scientific paper:

- **Free tier**: Works for simple text analysis, but no caching or PDF support
- **Paid tier**: ~$0.02 to cache a paper, then pennies per question

See the [Context Caching Demo](../../../examples/gemini_context_caching_demo.ipynb) for a complete walkthrough.

### Context Caching

Gemini supports **explicit context caching** for knowledge bases, providing significant
cost savings when making multiple queries against the same content.

#### How It Works

1. **First Query**: kanoa uploads your KB and creates a cache (billed at standard rate)
2. **Subsequent Queries**: Cached content is reused (billed at $0.50/1M vs $2.00/1M)
3. **Content Hashing**: kanoa detects KB changes and refreshes the cache automatically

#### Enabling Context Caching

```python
interpreter = AnalyticsInterpreter(
    backend='gemini',
    kb_path='./docs',
    cache_ttl=3600,  # Cache valid for 1 hour (default)
)
```

#### Usage Tracking

The `UsageInfo` object includes caching metrics:

```python
result = interpreter.interpret(prompt="Analyze this data")

print(f"Cached tokens: {result.usage.cached_tokens}")
print(f"Cache savings: ${result.usage.cache_savings:.4f}")
```

#### Minimum Token Requirements

Context caching requires a minimum number of tokens to be beneficial:

| Model | Minimum Tokens |
| --- | --- |
| gemini-2.5-flash | 1,024 |
| gemini-3-pro-preview | 2,048 |
| gemini-2.5-pro | 4,096 |

#### Cache Management

You can manage caches programmatically or via the CLI:

```python
# Clear cache manually (e.g., after updating KB files)
interpreter.clear_cache()

# Cache is also cleared automatically when KB content hash changes
```

For CLI usage:

```bash
python -m kanoa.tools.gemini_cache list
```

#### Best Practices

- ✅ Use for interactive sessions with multiple queries
- ✅ Set `cache_ttl` based on your session length
- ✅ Monitor `cache_savings` to track ROI
- ❌ Avoid for single-shot queries (cache creation overhead)
- ❌ Avoid for KBs < 2,048 tokens (no caching benefit)

## Claude (`claude`)

> For detailed documentation, see [Claude Backend Reference](../backends/claude.md).

**Best for**: Strong reasoning, text-heavy analysis (Claude Sonnet 4.5)

### Features

- **Vision Support**: Interprets images (but not PDFs directly)
- **Strong Reasoning**: Excellent for complex analytical tasks
- **200K Context**: Large context window for text knowledge bases

### Authentication

```bash
export ANTHROPIC_API_KEY="your-api-key"
```

### Usage

```python
interpreter = AnalyticsInterpreter(
    backend='claude',
    api_key='your-api-key'
)
```

### Pricing

- Input: $3.00 per 1M tokens
- Output: $15.00 per 1M tokens

## vLLM (`vllm`)

> For detailed documentation, see [vLLM Backend Reference](../backends/vllm.md).

**Best for**: Local inference with open-source models (Molmo, Gemma 3)

### Features

- **Fully Open Source**: Run inference locally with no API costs
- **Privacy**: Your data never leaves your machine
- **Vision Support**: Supports multimodal models like Molmo
- **GPU Acceleration**: Optimized for NVIDIA GPUs via vLLM

### Usage

```python
from kanoa import AnalyticsInterpreter

# Local vLLM (Molmo 7B)
interpreter = AnalyticsInterpreter(
    backend='vllm',
    api_base='http://localhost:8000/v1',
    model='allenai/Molmo-7B-D-0924'
)

# Local vLLM (Gemma 3 12B)
interpreter = AnalyticsInterpreter(
    backend='vllm',
    api_base='http://localhost:8000/v1',
    model='google/gemma-3-12b-it'
)
```

See the [vLLM Getting Started Guide](getting_started_vllm.md) for setup instructions.

## OpenAI (`openai`)

> For detailed documentation, see [OpenAI Backend Reference](../backends/openai.md).

**Best for**: GPT models, Azure OpenAI

### Features

- **GPT Models**: Access to GPT-4, GPT-5.1, and future models
- **Azure OpenAI**: Enterprise deployment via Azure
- **Vision Support**: Supports image inputs with compatible models

### Usage

#### OpenAI (GPT-5.1)

```python
interpreter = AnalyticsInterpreter(
    backend='openai',
    api_key='sk-...'
)
```

#### Azure OpenAI

```python
interpreter = AnalyticsInterpreter(
    backend='openai',
    api_base='https://your-resource.openai.azure.com/...',
    api_key='your-azure-key'
)
```

---

## Enterprise Considerations

### Current: Google AI Studio (`google-genai`)

kanoa currently uses the `google-genai` SDK, which connects to Google AI Studio.
This is the recommended approach for most users:

- ✅ Simple API key authentication
- ✅ Application Default Credentials (ADC) support
- ✅ Low friction setup
- ✅ Full Gemini feature support (context caching, multimodal)

### Future: Vertex AI Backend

For enterprise users requiring advanced compliance and security features,
a dedicated Vertex AI backend is on the roadmap.

| Feature | Google AI (`google-genai`) | Vertex AI (roadmap) |
| --------- | --------------------------- | --------------------- |
| Auth | API key / ADC | Service account / ADC |
| VPC Service Controls | ❌ | ✅ |
| Audit Logs | ❌ | ✅ (Cloud Logging) |
| CMEK (Customer-Managed Keys) | ❌ | ✅ |
| Private Endpoints | ❌ | ✅ |
| Model Registry | Limited | Full access |
| SLA | Consumer | Enterprise |

**Interested in enterprise features?**
Open an issue on [GitHub](https://github.com/lhzn-io/kanoa/issues)
to discuss your requirements and help prioritize the Vertex AI backend.