Skip to Content
🎉 Trogent Platform is live! Create your first AI agent
DocumentationKnowledge Base

Knowledge Base Management

Enhance your AI agents with custom knowledge through document uploads, text input, and website scraping capabilities.

Overview

Trogent’s Knowledge Base system uses advanced vector search technology to provide your agents with contextually relevant information. This enables agents to answer questions about your specific products, services, or domain expertise with accuracy and confidence.

Key Features

  • Document Processing: Upload PDF, DOC, and TXT files
  • Text Input: Direct text content addition
  • Website Scraping: Automatic content extraction from URLs
  • Vector Search: Semantic similarity matching using OpenAI embeddings
  • Real-time Updates: Instant knowledge base modifications
  • Storage Management: Efficient storage with usage tracking

Getting Started

Access Knowledge Base

  1. Navigate to your agent in the dashboard
  2. Click on the agent name to view details
  3. Select the Knowledge Base tab
  4. Begin adding your first knowledge sources

Storage Limits

  • Default Limit: 400KB per agent
  • File Size: Individual files up to 1MB
  • Supported Formats: PDF, DOCX, TXT, plain text, URLs

Adding Knowledge Sources

Document Upload

Supported File Types

  • PDF Files: Automatically extracts text content
  • Word Documents (.docx): Full text extraction with formatting
  • Text Files (.txt): Direct text import
  • Maximum Size: 1MB per file

Upload Process

  1. Click “Add Knowledge Source”
  2. Select “Upload File”
  3. Choose your file from your computer
  4. Add a descriptive title (auto-generated from filename)
  5. Click “Upload” to process

Processing Status

  • Uploading: File transfer in progress
  • Processing: Text extraction and chunking
  • Embedding: Vector generation for search
  • Completed: Ready for agent use
  • Failed: Processing error (check file format/size)

Text Input

Direct Text Addition

  1. Click “Add Knowledge Source”
  2. Select “Add Text”
  3. Enter a descriptive title
  4. Paste or type your content in the text area
  5. Click “Save” to process

Best Practices for Text Input

  • Use clear, descriptive titles
  • Format content with headings and bullet points
  • Include relevant keywords naturally
  • Keep individual entries focused on specific topics
  • Update content regularly to maintain accuracy

Website Scraping

URL Content Extraction

  1. Click “Add Knowledge Source”
  2. Select “Scrape Website”
  3. Enter the full URL (including https://)
  4. Add a descriptive title (auto-generated from page title)
  5. Click “Scrape” to extract content

Scraping Capabilities

  • HTML Text Extraction: Removes formatting, extracts readable content
  • Title Detection: Automatically uses page title
  • Content Filtering: Focuses on main content areas
  • Link Following: Single-page scraping only

Limitations

  • Static Content Only: JavaScript-rendered content may not be captured
  • Single Page: Does not follow links to other pages
  • Rate Limits: Respects website robots.txt and rate limiting
  • Content Type: Text-based content only (no images or media)

Knowledge Management

Viewing Sources

  • List View: All knowledge sources with titles and types
  • Status Indicators: Processing status and health
  • Usage Statistics: Storage used per source
  • Search Functionality: Find specific sources quickly

Editing Sources

  • Edit Title: Modify source titles for better organization
  • Update Content: Refresh text-based sources with new information
  • Reprocess Files: Re-upload and process modified documents
  • Status Monitoring: Track processing and embedding status

Deleting Sources

  1. Select the source you want to remove
  2. Click the “Delete” button (trash icon)
  3. Confirm deletion in the popup dialog
  4. Storage space is immediately reclaimed

Bulk Operations

  • Select Multiple: Use checkboxes to select multiple sources
  • Bulk Delete: Remove multiple sources simultaneously
  • Export Sources: Download source information (coming soon)

Search and Testing

Test your knowledge base to ensure proper content indexing:

  1. Navigate to the Knowledge Base tab
  2. Use the “Search Knowledge” feature
  3. Enter test queries related to your content
  4. Review search results and relevance scores
  5. Refine content based on search performance

Search Results

  • Relevance Score: Similarity percentage (0-100%)
  • Source Information: Which document/source provided the result
  • Content Preview: Matched text snippet
  • Context Window: Surrounding content for context

Testing Agent Responses

  1. Use the agent chat interface
  2. Ask questions about your uploaded content
  3. Verify accuracy and relevance of responses
  4. Update knowledge base based on performance

Technical Details

Vector Processing

  • Model: OpenAI text-embedding-3-small (1536 dimensions)
  • Chunking: 1000 characters with 200 character overlap
  • Storage: PostgreSQL with pgvector extension
  • Search: Cosine similarity with HNSW indexing

Processing Pipeline

  1. Content Extraction: Text extraction from various formats
  2. Text Chunking: Split content into searchable segments
  3. Embedding Generation: Convert text to vector representations
  4. Index Storage: Store vectors with optimized indexing
  5. Real-time Availability: Immediate search capability

Performance Optimization

  • Batch Processing: Efficient handling of multiple sources
  • Caching: Smart caching for frequently accessed content
  • Indexing: Optimized vector search performance
  • Rate Limiting: Balanced processing to prevent overload

Best Practices

Content Organization

  1. Descriptive Titles: Use clear, searchable titles
  2. Topic Grouping: Organize related content together
  3. Regular Updates: Keep information current and accurate
  4. Quality Control: Review content for accuracy and relevance

File Preparation

  1. Clean Documents: Remove unnecessary formatting and content
  2. Structured Content: Use headings, lists, and clear organization
  3. Relevant Information: Focus on content that supports agent responses
  4. File Naming: Use descriptive filenames for easier management

Search Optimization

  1. Keyword Usage: Include relevant keywords naturally
  2. Content Depth: Provide comprehensive information on topics
  3. Cross-References: Link related concepts within content
  4. Regular Testing: Verify search results match expectations

Maintenance

  1. Regular Reviews: Periodically review and update content
  2. Performance Monitoring: Track which sources are most useful
  3. User Feedback: Incorporate feedback to improve content quality
  4. Storage Management: Monitor usage and optimize storage

Troubleshooting

Common Issues

File Upload Problems

  • “File too large”: Reduce file size or split into smaller documents
  • “Unsupported format”: Convert to PDF, DOCX, or TXT
  • “Processing failed”: Check file corruption or format issues
  • “Upload timeout”: Try smaller files or check internet connection

Website Scraping Issues

  • “Failed to scrape”: Check URL accessibility and format
  • “No content found”: Website may use JavaScript rendering
  • “Access denied”: Website blocks automated access
  • “Timeout error”: Website may be slow to respond

Search Performance

  • “Poor search results”: Review content quality and keyword usage
  • “No results found”: Check if content was properly processed
  • “Irrelevant results”: Consider content organization and clarity
  • “Slow search”: Large knowledge base may impact performance

Support Resources

  • Processing Status: Check real-time processing status
  • Error Logs: Detailed error information for troubleshooting
  • Documentation: Comprehensive guides and examples
  • Community Support: User community for questions and tips

Advanced Features

Custom Chunking (Coming Soon)

  • Chunk Size Configuration: Customize text segment sizes
  • Overlap Settings: Adjust overlap between chunks
  • Content Type Optimization: Different settings for different content types

Integration APIs (Coming Soon)

  • REST API: Programmatic knowledge base management
  • Webhook Support: Real-time processing notifications
  • Bulk Import: Large-scale content import capabilities

Analytics (Coming Soon)

  • Usage Statistics: Track which knowledge is used most
  • Search Analytics: Monitor search patterns and performance
  • Content Effectiveness: Measure content impact on agent responses
  • Optimization Recommendations: AI-powered content improvement suggestions