Cost Management

When working with LLM APIs, costs can accumulate quickly — especially during development and experimentation. kanoa provides the TokenGuard system to help you monitor and control API costs before requests are sent.

Overview

TokenGuard provides pre-flight token counting with configurable guardrails:

  • Warnings: Log when requests exceed a threshold (default: 10K tokens)

  • Approval prompts: Interactive confirmation for large requests (default: 50K tokens)

  • Hard limits: Block requests that exceed safety limits (default: 200K tokens)

Quick Start

from google import genai
from kanoa.backends.gemini import GeminiTokenCounter
from kanoa.core.token_guard import TokenGuard

# Initialize your client
client = genai.Client()

# Create a token counter for your backend
counter = GeminiTokenCounter(client, model="gemini-3-pro-preview")

# Wrap it with a guard
guard = TokenGuard(counter)

# Check before making an API call
content = "Your prompt here..."
result = guard.check(content)

print(f"Tokens: {result.token_count:,}")
print(f"Estimated cost: ${result.estimated_cost:.4f}")
print(f"Level: {result.level}")  # ok, warn, approval, reject

if result.approved:
    # Proceed with API call
    response = client.models.generate_content(...)

Backend-Specific Counters

Each backend has its own token counter that uses the provider’s native API:

Gemini

from kanoa.backends.gemini import GeminiTokenCounter

counter = GeminiTokenCounter(client, model="gemini-3-pro-preview")

Claude

from kanoa.backends.claude import ClaudeTokenCounter

counter = ClaudeTokenCounter(
    client,
    model="claude-sonnet-4-5",
    system="Optional system prompt"  # Counted separately by Claude
)

Fallback (Estimation Only)

For backends without native token counting, use the estimation-based fallback:

from kanoa.core.token_guard import FallbackTokenCounter

counter = FallbackTokenCounter(backend_name="custom", model="my-model")

Threshold Configuration

Via Constructor

guard = TokenGuard(
    counter,
    warn_threshold=5_000,       # Warn above 5K tokens
    approval_threshold=25_000,  # Require approval above 25K
    reject_threshold=100_000,   # Hard block above 100K
    auto_approve=False,         # Require interactive confirmation
)

Via Environment Variables

For automation and CI/CD:

export KANOA_TOKEN_WARN_THRESHOLD=5000
export KANOA_TOKEN_APPROVAL_THRESHOLD=25000
export KANOA_TOKEN_REJECT_THRESHOLD=100000
export KANOA_AUTO_APPROVE=1  # Skip interactive prompts

Usage Patterns

Basic Check

result = guard.check(content)

if result.level == "reject":
    print(f"Request too large: {result.message}")
elif result.requires_approval and not result.approved:
    print("User declined large request")
else:
    # Safe to proceed
    pass

Guard with Exception

The guard() method combines checking with automatic exception raising:

from kanoa.core.token_guard import TokenLimitExceeded

try:
    result = guard.guard(content)
    # Request approved, proceed
    response = client.models.generate_content(...)
except TokenLimitExceeded as e:
    print(f"Blocked: {e.token_count:,} tokens exceeds {e.limit:,}")
    print(f"Would have cost: ${e.estimated_cost:.4f}")

Custom Pricing

Override default pricing for accurate cost estimates:

# Gemini 3.0 Pro pricing
pricing = {
    "input_short": 2.00,   # Per 1M tokens, <=200K context
    "input_long": 4.00,    # Per 1M tokens, >200K context
}

result = guard.check(content, pricing=pricing)

Interactive Approval

When a request exceeds the approval threshold (and auto_approve=False), TokenGuard displays an interactive prompt:

============================================================
⚠️  LARGE TOKEN REQUEST - APPROVAL REQUIRED
============================================================
   Token count:    75,000
   Estimated cost: $0.1500
   Approval limit: 50,000 tokens
============================================================
Proceed with this request? [y/N]:

This works in both terminals and Jupyter notebooks.

Integration with Backends

TokenGuard is designed to integrate with kanoa backends. Here’s an example of adding token checking to a custom workflow:

from kanoa import AnalyticsInterpreter
from kanoa.backends.gemini import GeminiBackend, GeminiTokenCounter
from kanoa.core.token_guard import TokenGuard

# Set up guard with the backend's client
backend = GeminiBackend(model="gemini-3-pro-preview")
counter = GeminiTokenCounter(backend.client, model=backend.model)
guard = TokenGuard(counter, warn_threshold=5000)

# Check before expensive operations
kb_content = load_large_knowledge_base()
result = guard.check(kb_content)

if result.level in ("reject", "approval"):
    print(f"⚠️ Large KB detected: {result.token_count:,} tokens")
    print("Consider summarizing or chunking the knowledge base.")

Best Practices

  1. Set conservative defaults during development: Use lower thresholds to catch runaway costs early.

  2. Use auto_approve=True in CI/CD: Set via KANOA_AUTO_APPROVE=1 environment variable.

  3. Monitor cumulative costs: TokenGuard checks individual requests. For session-wide tracking, see the integration test cost tracker pattern.

  4. Estimate before loading large KBs: Check token counts before passing large knowledge bases to caching APIs.

API Reference

class kanoa.core.token_guard.TokenGuard(counter, warn_threshold=None, approval_threshold=None, reject_threshold=None, auto_approve=False)[source]

Pre-flight token counting and cost guardrails.

Provides configurable thresholds for: - Warnings: Log a warning but proceed - Approval: Prompt user for confirmation (Jupyter-friendly) - Rejection: Hard limit that blocks the request

All thresholds can be overridden via environment variables.

__init__(counter, warn_threshold=None, approval_threshold=None, reject_threshold=None, auto_approve=False)[source]

Initialize TokenGuard.

Parameters:
  • counter (TokenCounter) – TokenCounter instance (backend-specific)

  • warn_threshold (Optional[int]) – Token count to trigger warning (default: 10K)

  • approval_threshold (Optional[int]) – Token count to require approval (default: 50K)

  • reject_threshold (Optional[int]) – Token count to reject request (default: 200K)

  • auto_approve (bool) – Skip interactive prompts (for automation)

property backend_name: str

Return the backend name from the counter.

property model: str

Return the model name from the counter.

count_tokens(contents)[source]

Count tokens using the configured counter.

Parameters:

contents (Any) – Content to count (format varies by backend)

Return type:

int

Returns:

Token count

estimate_cost(token_count, pricing, context_threshold=200000)[source]

Estimate cost for input tokens based on pricing.

Parameters:
  • token_count (int) – Number of input tokens

  • pricing (Dict[str, float]) – Pricing dict with ‘input_short’, ‘input_long’ keys

  • context_threshold (int) – Threshold for short vs long context pricing

Return type:

float

Returns:

Estimated cost in dollars

check(contents, pricing=None)[source]

Check token count and determine if request should proceed.

Parameters:
  • contents (Any) – Content to check

  • pricing (Optional[Dict[str, float]]) – Optional pricing dict for cost estimation

Return type:

TokenCheckResult

Returns:

TokenCheckResult with approval status and message

guard(contents, pricing=None)[source]

Check tokens and raise exception if rejected.

Convenience method that combines check() with automatic rejection.

Parameters:
Return type:

TokenCheckResult

Returns:

TokenCheckResult if approved

Raises:

TokenLimitExceeded – If request exceeds reject threshold or user denies

class kanoa.core.token_guard.TokenCheckResult(token_count, estimated_cost, level, approved, message, requires_approval=False)[source]

Result of a token count check.

token_count: int
estimated_cost: float
level: str
approved: bool
message: str
requires_approval: bool = False
__init__(token_count, estimated_cost, level, approved, message, requires_approval=False)
class kanoa.core.token_guard.TokenLimitExceeded(token_count, limit, estimated_cost)[source]

Raised when token count exceeds the reject threshold.

__init__(token_count, limit, estimated_cost)[source]
class kanoa.core.token_guard.TokenCounter(*args, **kwargs)[source]

Protocol for backend-agnostic token counting.

property backend_name: str

Return the backend name (e.g., ‘gemini’, ‘claude’).

property model: str

Return the model name.

count_tokens(contents)[source]

Count tokens for the given contents.

Parameters:

contents (Any) – Content to count (format varies by backend)

Return type:

int

Returns:

Token count

estimate_tokens(contents)[source]

Fallback estimation when API counting fails.

Parameters:

contents (Any) – Content to estimate

Return type:

int

Returns:

Estimated token count

__init__(*args, **kwargs)
class kanoa.core.token_guard.BaseTokenCounter[source]

Base class for token counters with shared functionality.

abstract property backend_name: str

Return the backend name.

abstract property model: str

Return the model name.

abstractmethod count_tokens(contents)[source]

Count tokens using the backend API.

Return type:

int

estimate_tokens(contents)[source]

Fallback token estimation based on content size (~4 chars per token).

Return type:

int