# Knowledge Bases kanoa can ground its interpretations in your project's documentation and literature. ## Quick Start Simply point kanoa at a directory containing your documentation: ```python interpreter = AnalyticsInterpreter( backend='gemini', kb_path='./docs' ) ``` kanoa will automatically: 1. Scan the directory for all file types 2. Detect PDFs, markdown files, text files, and code 3. Use the optimal encoding strategy for your backend ## Supported File Types ### Text Files **Formats**: Markdown (`.md`), text (`.txt`), reStructuredText (`.rst`) **All backends support text files** — they're concatenated and included in the prompt. ### PDF Files **Format**: PDF files (`.pdf`) **Backend Support**: - **Gemini**: Native PDF support via File API (best quality, sees figures/tables) - **Claude**: Coming soon (native PDF support planned) - **OpenAI/vLLM**: Coming soon (PDF-to-image conversion planned) Currently, non-Gemini backends will show a warning and use text files only. ### Code Files **Formats**: Python (`.py`), JavaScript (`.js`), and other code files **Use case**: Include implementation details and examples ## Direct Content For small, dynamic knowledge bases, pass content directly: ```python kb_content = """ # Project Context This analysis uses the Smith et al. 2023 methodology. Key parameters: alpha=0.05, n=100 """ interpreter = AnalyticsInterpreter( kb_content=kb_content ) ``` ## Examples ### Mixed Content Directory ```python # Directory structure: # ./docs/ # ├── README.md # ├── api_reference.md # ├── paper.pdf # └── example.py interpreter = AnalyticsInterpreter( backend='gemini', kb_path='./docs' ) # kanoa automatically: # - Reads text from .md files # - Uploads paper.pdf via Gemini File API # - Includes example.py code ``` ### Academic Papers (PDF) ```python interpreter = AnalyticsInterpreter( backend='gemini', kb_path='./docs/literature' # Contains PDFs ) # Gemini "sees" the entire PDF: # - Text content # - Figures and tables # - Equations and formatting ``` ### Project Documentation (Text) ```python interpreter = AnalyticsInterpreter( backend='claude', # Works with any backend kb_path='./docs/project' # Contains .md files ) ``` ## Best Practices ### For Text Files - Use clear markdown headers - Keep files focused and modular - Include code snippets and examples - Total size: aim for <100K tokens ### For PDF Files - Use high-quality PDFs (not scanned images) - Limit to 10-20 key papers - Gemini caches PDFs, so reuse is cheap - Total size: aim for <500K tokens ## Reloading If your knowledge base files change during a session: ```python interpreter.reload_knowledge_base() ``` This will re-scan the directory and update the content. ## Migration from v0.1.x **Breaking Change in v0.2.0**: The `kb_type` parameter has been removed. Before (v0.1.x): ```python interpreter = AnalyticsInterpreter( backend='gemini', kb_path='./docs', kb_type='auto' # ❌ No longer needed ) ``` After (v0.2.0+): ```python interpreter = AnalyticsInterpreter( backend='gemini', kb_path='./docs' # ✓ Automatic detection ) ``` kanoa now automatically detects and optimally encodes all file types.