Cost Management
When working with LLM APIs, costs can accumulate quickly — especially during development
and experimentation. kanoa provides the TokenGuard system to help you monitor and
control API costs before requests are sent.
Overview
TokenGuard provides pre-flight token counting with configurable guardrails:
Warnings: Log when requests exceed a threshold (default: 10K tokens)
Approval prompts: Interactive confirmation for large requests (default: 50K tokens)
Hard limits: Block requests that exceed safety limits (default: 200K tokens)
Quick Start
from google import genai
from kanoa.backends.gemini import GeminiTokenCounter
from kanoa.core.token_guard import TokenGuard
# Initialize your client
client = genai.Client()
# Create a token counter for your backend
counter = GeminiTokenCounter(client, model="gemini-3-pro-preview")
# Wrap it with a guard
guard = TokenGuard(counter)
# Check before making an API call
content = "Your prompt here..."
result = guard.check(content)
print(f"Tokens: {result.token_count:,}")
print(f"Estimated cost: ${result.estimated_cost:.4f}")
print(f"Level: {result.level}") # ok, warn, approval, reject
if result.approved:
# Proceed with API call
response = client.models.generate_content(...)
Backend-Specific Counters
Each backend has its own token counter that uses the provider’s native API:
Gemini
from kanoa.backends.gemini import GeminiTokenCounter
counter = GeminiTokenCounter(client, model="gemini-3-pro-preview")
Claude
from kanoa.backends.claude import ClaudeTokenCounter
counter = ClaudeTokenCounter(
client,
model="claude-sonnet-4-5",
system="Optional system prompt" # Counted separately by Claude
)
Fallback (Estimation Only)
For backends without native token counting, use the estimation-based fallback:
from kanoa.core.token_guard import FallbackTokenCounter
counter = FallbackTokenCounter(backend_name="custom", model="my-model")
Threshold Configuration
Via Constructor
guard = TokenGuard(
counter,
warn_threshold=5_000, # Warn above 5K tokens
approval_threshold=25_000, # Require approval above 25K
reject_threshold=100_000, # Hard block above 100K
auto_approve=False, # Require interactive confirmation
)
Via Environment Variables
For automation and CI/CD:
export KANOA_TOKEN_WARN_THRESHOLD=5000
export KANOA_TOKEN_APPROVAL_THRESHOLD=25000
export KANOA_TOKEN_REJECT_THRESHOLD=100000
export KANOA_AUTO_APPROVE=1 # Skip interactive prompts
Usage Patterns
Basic Check
result = guard.check(content)
if result.level == "reject":
print(f"Request too large: {result.message}")
elif result.requires_approval and not result.approved:
print("User declined large request")
else:
# Safe to proceed
pass
Guard with Exception
The guard() method combines checking with automatic exception raising:
from kanoa.core.token_guard import TokenLimitExceeded
try:
result = guard.guard(content)
# Request approved, proceed
response = client.models.generate_content(...)
except TokenLimitExceeded as e:
print(f"Blocked: {e.token_count:,} tokens exceeds {e.limit:,}")
print(f"Would have cost: ${e.estimated_cost:.4f}")
Custom Pricing
Override default pricing for accurate cost estimates:
# Gemini 3.0 Pro pricing
pricing = {
"input_short": 2.00, # Per 1M tokens, <=200K context
"input_long": 4.00, # Per 1M tokens, >200K context
}
result = guard.check(content, pricing=pricing)
Interactive Approval
When a request exceeds the approval threshold (and auto_approve=False),
TokenGuard displays an interactive prompt:
============================================================
⚠️ LARGE TOKEN REQUEST - APPROVAL REQUIRED
============================================================
Token count: 75,000
Estimated cost: $0.1500
Approval limit: 50,000 tokens
============================================================
Proceed with this request? [y/N]:
This works in both terminals and Jupyter notebooks.
Integration with Backends
TokenGuard is designed to integrate with kanoa backends. Here’s an example of adding token checking to a custom workflow:
from kanoa import AnalyticsInterpreter
from kanoa.backends.gemini import GeminiBackend, GeminiTokenCounter
from kanoa.core.token_guard import TokenGuard
# Set up guard with the backend's client
backend = GeminiBackend(model="gemini-3-pro-preview")
counter = GeminiTokenCounter(backend.client, model=backend.model)
guard = TokenGuard(counter, warn_threshold=5000)
# Check before expensive operations
kb_content = load_large_knowledge_base()
result = guard.check(kb_content)
if result.level in ("reject", "approval"):
print(f"⚠️ Large KB detected: {result.token_count:,} tokens")
print("Consider summarizing or chunking the knowledge base.")
Best Practices
Set conservative defaults during development: Use lower thresholds to catch runaway costs early.
Use
auto_approve=Truein CI/CD: Set viaKANOA_AUTO_APPROVE=1environment variable.Monitor cumulative costs: TokenGuard checks individual requests. For session-wide tracking, see the integration test cost tracker pattern.
Estimate before loading large KBs: Check token counts before passing large knowledge bases to caching APIs.
API Reference
- class kanoa.core.token_guard.TokenGuard(counter, warn_threshold=None, approval_threshold=None, reject_threshold=None, auto_approve=False)[source]
Pre-flight token counting and cost guardrails.
Provides configurable thresholds for: - Warnings: Log a warning but proceed - Approval: Prompt user for confirmation (Jupyter-friendly) - Rejection: Hard limit that blocks the request
All thresholds can be overridden via environment variables.
- __init__(counter, warn_threshold=None, approval_threshold=None, reject_threshold=None, auto_approve=False)[source]
Initialize TokenGuard.
- Parameters:
counter (
TokenCounter) – TokenCounter instance (backend-specific)warn_threshold (
Optional[int]) – Token count to trigger warning (default: 10K)approval_threshold (
Optional[int]) – Token count to require approval (default: 50K)reject_threshold (
Optional[int]) – Token count to reject request (default: 200K)auto_approve (
bool) – Skip interactive prompts (for automation)
- estimate_cost(token_count, pricing, context_threshold=200000)[source]
Estimate cost for input tokens based on pricing.
- guard(contents, pricing=None)[source]
Check tokens and raise exception if rejected.
Convenience method that combines check() with automatic rejection.
- Parameters:
- Return type:
- Returns:
TokenCheckResult if approved
- Raises:
TokenLimitExceeded – If request exceeds reject threshold or user denies
- class kanoa.core.token_guard.TokenCheckResult(token_count, estimated_cost, level, approved, message, requires_approval=False)[source]
Result of a token count check.
- __init__(token_count, estimated_cost, level, approved, message, requires_approval=False)
- class kanoa.core.token_guard.TokenLimitExceeded(token_count, limit, estimated_cost)[source]
Raised when token count exceeds the reject threshold.
- class kanoa.core.token_guard.TokenCounter(*args, **kwargs)[source]
Protocol for backend-agnostic token counting.
- __init__(*args, **kwargs)