# Cost Management When working with LLM APIs, costs can accumulate quickly — especially during development and experimentation. kanoa provides the `TokenGuard` system to help you monitor and control API costs before requests are sent. ## Overview `TokenGuard` provides pre-flight token counting with configurable guardrails: - **Warnings**: Log when requests exceed a threshold (default: 10K tokens) - **Approval prompts**: Interactive confirmation for large requests (default: 50K tokens) - **Hard limits**: Block requests that exceed safety limits (default: 200K tokens) ## Quick Start ```python from google import genai from kanoa.backends.gemini import GeminiTokenCounter from kanoa.core.token_guard import TokenGuard # Initialize your client client = genai.Client() # Create a token counter for your backend counter = GeminiTokenCounter(client, model="gemini-3-pro-preview") # Wrap it with a guard guard = TokenGuard(counter) # Check before making an API call content = "Your prompt here..." result = guard.check(content) print(f"Tokens: {result.token_count:,}") print(f"Estimated cost: ${result.estimated_cost:.4f}") print(f"Level: {result.level}") # ok, warn, approval, reject if result.approved: # Proceed with API call response = client.models.generate_content(...) ``` ## Backend-Specific Counters Each backend has its own token counter that uses the provider's native API: ### Gemini ```python from kanoa.backends.gemini import GeminiTokenCounter counter = GeminiTokenCounter(client, model="gemini-3-pro-preview") ``` ### Claude ```python from kanoa.backends.claude import ClaudeTokenCounter counter = ClaudeTokenCounter( client, model="claude-sonnet-4-5", system="Optional system prompt" # Counted separately by Claude ) ``` ### Fallback (Estimation Only) For backends without native token counting, use the estimation-based fallback: ```python from kanoa.core.token_guard import FallbackTokenCounter counter = FallbackTokenCounter(backend_name="custom", model="my-model") ``` ## Threshold Configuration ### Via Constructor ```python guard = TokenGuard( counter, warn_threshold=5_000, # Warn above 5K tokens approval_threshold=25_000, # Require approval above 25K reject_threshold=100_000, # Hard block above 100K auto_approve=False, # Require interactive confirmation ) ``` ### Via Environment Variables For automation and CI/CD: ```bash export KANOA_TOKEN_WARN_THRESHOLD=5000 export KANOA_TOKEN_APPROVAL_THRESHOLD=25000 export KANOA_TOKEN_REJECT_THRESHOLD=100000 export KANOA_AUTO_APPROVE=1 # Skip interactive prompts ``` ## Usage Patterns ### Basic Check ```python result = guard.check(content) if result.level == "reject": print(f"Request too large: {result.message}") elif result.requires_approval and not result.approved: print("User declined large request") else: # Safe to proceed pass ``` ### Guard with Exception The `guard()` method combines checking with automatic exception raising: ```python from kanoa.core.token_guard import TokenLimitExceeded try: result = guard.guard(content) # Request approved, proceed response = client.models.generate_content(...) except TokenLimitExceeded as e: print(f"Blocked: {e.token_count:,} tokens exceeds {e.limit:,}") print(f"Would have cost: ${e.estimated_cost:.4f}") ``` ### Custom Pricing Override default pricing for accurate cost estimates: ```python # Gemini 3.0 Pro pricing pricing = { "input_short": 2.00, # Per 1M tokens, <=200K context "input_long": 4.00, # Per 1M tokens, >200K context } result = guard.check(content, pricing=pricing) ``` ## Interactive Approval When a request exceeds the approval threshold (and `auto_approve=False`), TokenGuard displays an interactive prompt: ```text ============================================================ ⚠️ LARGE TOKEN REQUEST - APPROVAL REQUIRED ============================================================ Token count: 75,000 Estimated cost: $0.1500 Approval limit: 50,000 tokens ============================================================ Proceed with this request? [y/N]: ``` This works in both terminals and Jupyter notebooks. ## Integration with Backends TokenGuard is designed to integrate with kanoa backends. Here's an example of adding token checking to a custom workflow: ```python from kanoa import AnalyticsInterpreter from kanoa.backends.gemini import GeminiBackend, GeminiTokenCounter from kanoa.core.token_guard import TokenGuard # Set up guard with the backend's client backend = GeminiBackend(model="gemini-3-pro-preview") counter = GeminiTokenCounter(backend.client, model=backend.model) guard = TokenGuard(counter, warn_threshold=5000) # Check before expensive operations kb_content = load_large_knowledge_base() result = guard.check(kb_content) if result.level in ("reject", "approval"): print(f"⚠️ Large KB detected: {result.token_count:,} tokens") print("Consider summarizing or chunking the knowledge base.") ``` ## Best Practices 1. **Set conservative defaults during development**: Use lower thresholds to catch runaway costs early. 2. **Use `auto_approve=True` in CI/CD**: Set via `KANOA_AUTO_APPROVE=1` environment variable. 3. **Monitor cumulative costs**: TokenGuard checks individual requests. For session-wide tracking, see the integration test cost tracker pattern. 4. **Estimate before loading large KBs**: Check token counts before passing large knowledge bases to caching APIs. ## API Reference ```{eval-rst} .. autoclass:: kanoa.core.token_guard.TokenGuard :members: :undoc-members: .. autoclass:: kanoa.core.token_guard.TokenCheckResult :members: .. autoclass:: kanoa.core.token_guard.TokenLimitExceeded :members: .. autoclass:: kanoa.core.token_guard.TokenCounter :members: .. autoclass:: kanoa.core.token_guard.BaseTokenCounter :members: ```