Knowledge Base Management

Enhance your AI agents with custom knowledge through document uploads, text input, and website scraping capabilities.

Overview

Trogent’s Knowledge Base system uses advanced vector search technology to provide your agents with contextually relevant information. This enables agents to answer questions about your specific products, services, or domain expertise with accuracy and confidence.

Key Features

Document Processing: Upload PDF, DOC, and TXT files
Text Input: Direct text content addition
Website Scraping: Automatic content extraction from URLs
Vector Search: Semantic similarity matching using OpenAI embeddings
Real-time Updates: Instant knowledge base modifications
Storage Management: Efficient storage with usage tracking

Getting Started

Access Knowledge Base

Navigate to your agent in the dashboard
Click on the agent name to view details
Select the Knowledge Base tab
Begin adding your first knowledge sources

Storage Limits

Default Limit: 400KB per agent
File Size: Individual files up to 1MB
Supported Formats: PDF, DOCX, TXT, plain text, URLs

Adding Knowledge Sources

Document Upload

Supported File Types

PDF Files: Automatically extracts text content
Word Documents (.docx): Full text extraction with formatting
Text Files (.txt): Direct text import
Maximum Size: 1MB per file

Upload Process

Click “Add Knowledge Source”
Select “Upload File”
Choose your file from your computer
Add a descriptive title (auto-generated from filename)
Click “Upload” to process

Processing Status

Uploading: File transfer in progress
Processing: Text extraction and chunking
Embedding: Vector generation for search
Completed: Ready for agent use
Failed: Processing error (check file format/size)

Text Input

Direct Text Addition

Click “Add Knowledge Source”
Select “Add Text”
Enter a descriptive title
Paste or type your content in the text area
Click “Save” to process

Best Practices for Text Input

Use clear, descriptive titles
Format content with headings and bullet points
Include relevant keywords naturally
Keep individual entries focused on specific topics
Update content regularly to maintain accuracy

Website Scraping

URL Content Extraction

Click “Add Knowledge Source”
Select “Scrape Website”
Enter the full URL (including https://)
Add a descriptive title (auto-generated from page title)
Click “Scrape” to extract content

Scraping Capabilities

HTML Text Extraction: Removes formatting, extracts readable content
Title Detection: Automatically uses page title
Content Filtering: Focuses on main content areas
Link Following: Single-page scraping only

Limitations

Static Content Only: JavaScript-rendered content may not be captured
Single Page: Does not follow links to other pages
Rate Limits: Respects website robots.txt and rate limiting
Content Type: Text-based content only (no images or media)

Knowledge Management

Viewing Sources

List View: All knowledge sources with titles and types
Status Indicators: Processing status and health
Usage Statistics: Storage used per source
Search Functionality: Find specific sources quickly

Editing Sources

Edit Title: Modify source titles for better organization
Update Content: Refresh text-based sources with new information
Reprocess Files: Re-upload and process modified documents
Status Monitoring: Track processing and embedding status

Deleting Sources

Select the source you want to remove
Click the “Delete” button (trash icon)
Confirm deletion in the popup dialog
Storage space is immediately reclaimed

Bulk Operations

Select Multiple: Use checkboxes to select multiple sources
Bulk Delete: Remove multiple sources simultaneously
Export Sources: Download source information (coming soon)

Search and Testing

Knowledge Search

Test your knowledge base to ensure proper content indexing:

Navigate to the Knowledge Base tab
Use the “Search Knowledge” feature
Enter test queries related to your content
Review search results and relevance scores
Refine content based on search performance

Search Results

Relevance Score: Similarity percentage (0-100%)
Source Information: Which document/source provided the result
Content Preview: Matched text snippet
Context Window: Surrounding content for context

Testing Agent Responses

Use the agent chat interface
Ask questions about your uploaded content
Verify accuracy and relevance of responses
Update knowledge base based on performance

Technical Details

Vector Processing

Model: OpenAI text-embedding-3-small (1536 dimensions)
Chunking: 1000 characters with 200 character overlap
Storage: PostgreSQL with pgvector extension
Search: Cosine similarity with HNSW indexing

Processing Pipeline

Content Extraction: Text extraction from various formats
Text Chunking: Split content into searchable segments
Embedding Generation: Convert text to vector representations
Index Storage: Store vectors with optimized indexing
Real-time Availability: Immediate search capability

Performance Optimization

Batch Processing: Efficient handling of multiple sources
Caching: Smart caching for frequently accessed content
Indexing: Optimized vector search performance
Rate Limiting: Balanced processing to prevent overload

Best Practices

Content Organization

Descriptive Titles: Use clear, searchable titles
Topic Grouping: Organize related content together
Regular Updates: Keep information current and accurate
Quality Control: Review content for accuracy and relevance

File Preparation

Clean Documents: Remove unnecessary formatting and content
Structured Content: Use headings, lists, and clear organization
Relevant Information: Focus on content that supports agent responses
File Naming: Use descriptive filenames for easier management

Search Optimization

Keyword Usage: Include relevant keywords naturally
Content Depth: Provide comprehensive information on topics
Cross-References: Link related concepts within content
Regular Testing: Verify search results match expectations

Maintenance

Regular Reviews: Periodically review and update content
Performance Monitoring: Track which sources are most useful
User Feedback: Incorporate feedback to improve content quality
Storage Management: Monitor usage and optimize storage

Troubleshooting

Common Issues

File Upload Problems

“File too large”: Reduce file size or split into smaller documents
“Unsupported format”: Convert to PDF, DOCX, or TXT
“Processing failed”: Check file corruption or format issues
“Upload timeout”: Try smaller files or check internet connection

Website Scraping Issues

“Failed to scrape”: Check URL accessibility and format
“No content found”: Website may use JavaScript rendering
“Access denied”: Website blocks automated access
“Timeout error”: Website may be slow to respond

Search Performance

“Poor search results”: Review content quality and keyword usage
“No results found”: Check if content was properly processed
“Irrelevant results”: Consider content organization and clarity
“Slow search”: Large knowledge base may impact performance

Support Resources

Processing Status: Check real-time processing status
Error Logs: Detailed error information for troubleshooting
Documentation: Comprehensive guides and examples
Community Support: User community for questions and tips

Advanced Features

Custom Chunking (Coming Soon)

Chunk Size Configuration: Customize text segment sizes
Overlap Settings: Adjust overlap between chunks
Content Type Optimization: Different settings for different content types

Integration APIs (Coming Soon)

REST API: Programmatic knowledge base management
Webhook Support: Real-time processing notifications
Bulk Import: Large-scale content import capabilities

Analytics (Coming Soon)

Usage Statistics: Track which knowledge is used most
Search Analytics: Monitor search patterns and performance
Content Effectiveness: Measure content impact on agent responses
Optimization Recommendations: AI-powered content improvement suggestions