Knowledge Bases
kanoa can ground its interpretations in your project’s documentation and literature.
Quick Start
Simply point kanoa at a directory containing your documentation:
interpreter = AnalyticsInterpreter(
backend='gemini',
kb_path='./docs'
)
kanoa will automatically:
Scan the directory for all file types
Detect PDFs, markdown files, text files, and code
Use the optimal encoding strategy for your backend
Supported File Types
Text Files
Formats: Markdown (.md), text (.txt), reStructuredText (.rst)
All backends support text files — they’re concatenated and included in the prompt.
PDF Files
Format: PDF files (.pdf)
Backend Support:
Gemini: Native PDF support via File API (best quality, sees figures/tables)
Claude: Coming soon (native PDF support planned)
OpenAI/vLLM: Coming soon (PDF-to-image conversion planned)
Currently, non-Gemini backends will show a warning and use text files only.
Code Files
Formats: Python (.py), JavaScript (.js), and other code files
Use case: Include implementation details and examples
Direct Content
For small, dynamic knowledge bases, pass content directly:
kb_content = """
# Project Context
This analysis uses the Smith et al. 2023 methodology.
Key parameters: alpha=0.05, n=100
"""
interpreter = AnalyticsInterpreter(
kb_content=kb_content
)
Examples
Mixed Content Directory
# Directory structure:
# ./docs/
# ├── README.md
# ├── api_reference.md
# ├── paper.pdf
# └── example.py
interpreter = AnalyticsInterpreter(
backend='gemini',
kb_path='./docs'
)
# kanoa automatically:
# - Reads text from .md files
# - Uploads paper.pdf via Gemini File API
# - Includes example.py code
Academic Papers (PDF)
interpreter = AnalyticsInterpreter(
backend='gemini',
kb_path='./docs/literature' # Contains PDFs
)
# Gemini "sees" the entire PDF:
# - Text content
# - Figures and tables
# - Equations and formatting
Project Documentation (Text)
interpreter = AnalyticsInterpreter(
backend='claude', # Works with any backend
kb_path='./docs/project' # Contains .md files
)
Best Practices
For Text Files
Use clear markdown headers
Keep files focused and modular
Include code snippets and examples
Total size: aim for <100K tokens
For PDF Files
Use high-quality PDFs (not scanned images)
Limit to 10-20 key papers
Gemini caches PDFs, so reuse is cheap
Total size: aim for <500K tokens
Reloading
If your knowledge base files change during a session:
interpreter.reload_knowledge_base()
This will re-scan the directory and update the content.
Migration from v0.1.x
Breaking Change in v0.2.0: The kb_type parameter has been removed.
Before (v0.1.x):
interpreter = AnalyticsInterpreter(
backend='gemini',
kb_path='./docs',
kb_type='auto' # ❌ No longer needed
)
After (v0.2.0+):
interpreter = AnalyticsInterpreter(
backend='gemini',
kb_path='./docs' # ✓ Automatic detection
)
kanoa now automatically detects and optimally encodes all file types.