Knowledge Base Management
Enhance your AI agents with custom knowledge through document uploads, text input, and website scraping capabilities.
Overview
Trogent’s Knowledge Base system uses advanced vector search technology to provide your agents with contextually relevant information. This enables agents to answer questions about your specific products, services, or domain expertise with accuracy and confidence.
Key Features
- Document Processing: Upload PDF, DOC, and TXT files
- Text Input: Direct text content addition
- Website Scraping: Automatic content extraction from URLs
- Vector Search: Semantic similarity matching using OpenAI embeddings
- Real-time Updates: Instant knowledge base modifications
- Storage Management: Efficient storage with usage tracking
Getting Started
Access Knowledge Base
- Navigate to your agent in the dashboard
- Click on the agent name to view details
- Select the Knowledge Base tab
- Begin adding your first knowledge sources
Storage Limits
- Default Limit: 400KB per agent
- File Size: Individual files up to 1MB
- Supported Formats: PDF, DOCX, TXT, plain text, URLs
Adding Knowledge Sources
Document Upload
Supported File Types
- PDF Files: Automatically extracts text content
- Word Documents (.docx): Full text extraction with formatting
- Text Files (.txt): Direct text import
- Maximum Size: 1MB per file
Upload Process
- Click “Add Knowledge Source”
- Select “Upload File”
- Choose your file from your computer
- Add a descriptive title (auto-generated from filename)
- Click “Upload” to process
Processing Status
- Uploading: File transfer in progress
- Processing: Text extraction and chunking
- Embedding: Vector generation for search
- Completed: Ready for agent use
- Failed: Processing error (check file format/size)
Text Input
Direct Text Addition
- Click “Add Knowledge Source”
- Select “Add Text”
- Enter a descriptive title
- Paste or type your content in the text area
- Click “Save” to process
Best Practices for Text Input
- Use clear, descriptive titles
- Format content with headings and bullet points
- Include relevant keywords naturally
- Keep individual entries focused on specific topics
- Update content regularly to maintain accuracy
Website Scraping
URL Content Extraction
- Click “Add Knowledge Source”
- Select “Scrape Website”
- Enter the full URL (including https://)
- Add a descriptive title (auto-generated from page title)
- Click “Scrape” to extract content
Scraping Capabilities
- HTML Text Extraction: Removes formatting, extracts readable content
- Title Detection: Automatically uses page title
- Content Filtering: Focuses on main content areas
- Link Following: Single-page scraping only
Limitations
- Static Content Only: JavaScript-rendered content may not be captured
- Single Page: Does not follow links to other pages
- Rate Limits: Respects website robots.txt and rate limiting
- Content Type: Text-based content only (no images or media)
Knowledge Management
Viewing Sources
- List View: All knowledge sources with titles and types
- Status Indicators: Processing status and health
- Usage Statistics: Storage used per source
- Search Functionality: Find specific sources quickly
Editing Sources
- Edit Title: Modify source titles for better organization
- Update Content: Refresh text-based sources with new information
- Reprocess Files: Re-upload and process modified documents
- Status Monitoring: Track processing and embedding status
Deleting Sources
- Select the source you want to remove
- Click the “Delete” button (trash icon)
- Confirm deletion in the popup dialog
- Storage space is immediately reclaimed
Bulk Operations
- Select Multiple: Use checkboxes to select multiple sources
- Bulk Delete: Remove multiple sources simultaneously
- Export Sources: Download source information (coming soon)
Search and Testing
Knowledge Search
Test your knowledge base to ensure proper content indexing:
- Navigate to the Knowledge Base tab
- Use the “Search Knowledge” feature
- Enter test queries related to your content
- Review search results and relevance scores
- Refine content based on search performance
Search Results
- Relevance Score: Similarity percentage (0-100%)
- Source Information: Which document/source provided the result
- Content Preview: Matched text snippet
- Context Window: Surrounding content for context
Testing Agent Responses
- Use the agent chat interface
- Ask questions about your uploaded content
- Verify accuracy and relevance of responses
- Update knowledge base based on performance
Technical Details
Vector Processing
- Model: OpenAI text-embedding-3-small (1536 dimensions)
- Chunking: 1000 characters with 200 character overlap
- Storage: PostgreSQL with pgvector extension
- Search: Cosine similarity with HNSW indexing
Processing Pipeline
- Content Extraction: Text extraction from various formats
- Text Chunking: Split content into searchable segments
- Embedding Generation: Convert text to vector representations
- Index Storage: Store vectors with optimized indexing
- Real-time Availability: Immediate search capability
Performance Optimization
- Batch Processing: Efficient handling of multiple sources
- Caching: Smart caching for frequently accessed content
- Indexing: Optimized vector search performance
- Rate Limiting: Balanced processing to prevent overload
Best Practices
Content Organization
- Descriptive Titles: Use clear, searchable titles
- Topic Grouping: Organize related content together
- Regular Updates: Keep information current and accurate
- Quality Control: Review content for accuracy and relevance
File Preparation
- Clean Documents: Remove unnecessary formatting and content
- Structured Content: Use headings, lists, and clear organization
- Relevant Information: Focus on content that supports agent responses
- File Naming: Use descriptive filenames for easier management
Search Optimization
- Keyword Usage: Include relevant keywords naturally
- Content Depth: Provide comprehensive information on topics
- Cross-References: Link related concepts within content
- Regular Testing: Verify search results match expectations
Maintenance
- Regular Reviews: Periodically review and update content
- Performance Monitoring: Track which sources are most useful
- User Feedback: Incorporate feedback to improve content quality
- Storage Management: Monitor usage and optimize storage
Troubleshooting
Common Issues
File Upload Problems
- “File too large”: Reduce file size or split into smaller documents
- “Unsupported format”: Convert to PDF, DOCX, or TXT
- “Processing failed”: Check file corruption or format issues
- “Upload timeout”: Try smaller files or check internet connection
Website Scraping Issues
- “Failed to scrape”: Check URL accessibility and format
- “No content found”: Website may use JavaScript rendering
- “Access denied”: Website blocks automated access
- “Timeout error”: Website may be slow to respond
Search Performance
- “Poor search results”: Review content quality and keyword usage
- “No results found”: Check if content was properly processed
- “Irrelevant results”: Consider content organization and clarity
- “Slow search”: Large knowledge base may impact performance
Support Resources
- Processing Status: Check real-time processing status
- Error Logs: Detailed error information for troubleshooting
- Documentation: Comprehensive guides and examples
- Community Support: User community for questions and tips
Advanced Features
Custom Chunking (Coming Soon)
- Chunk Size Configuration: Customize text segment sizes
- Overlap Settings: Adjust overlap between chunks
- Content Type Optimization: Different settings for different content types
Integration APIs (Coming Soon)
- REST API: Programmatic knowledge base management
- Webhook Support: Real-time processing notifications
- Bulk Import: Large-scale content import capabilities
Analytics (Coming Soon)
- Usage Statistics: Track which knowledge is used most
- Search Analytics: Monitor search patterns and performance
- Content Effectiveness: Measure content impact on agent responses
- Optimization Recommendations: AI-powered content improvement suggestions