Skip to Content
šŸŽ‰ Trogent Platform is live! Create your first AI agent
DocumentationTechnical DetailsPlatform Rate Limits

Platform Rate Limits

Telegram Bot API

Message Limits

  • Per User: 10 messages/minute per individual user
  • Per Chat: 30 messages/second to different chats
  • Daily Limit: Unlimited daily messages (within rate limits)
  • Group Chats: Responds to @mentions only to prevent spam
  • Bot Restrictions: Cannot initiate conversations with users

Connection Limits

  • Polling Rate: Continuous long polling supported
  • Webhook: Alternative to polling (coming soon)
  • Concurrent Connections: Multiple bot instances supported
  • Auto Recovery: Automatic reconnection on failures

Error Handling

  • Rate Limit Detection: Automatic detection of rate limiting
  • Exponential Backoff: Progressive delays on repeated failures
  • Queue Management: Message queuing during rate limit periods
  • User Notifications: Alert users when rate limited

Website Widget

Connection Limits

  • Concurrent Users: No hard limit (resource dependent)
  • WebSocket Connections: One per active user session
  • Message Frequency: No artificial rate limiting
  • Session Duration: Unlimited session length

Performance Considerations

  • Response Time: Dependent on AI model processing
  • Queue Management: Handles high traffic gracefully
  • Resource Scaling: Automatic scaling based on demand
  • Error Recovery: Automatic reconnection on disconnection

OpenAI API Limits

Model-Specific Limits

  • GPT-4o-mini:
    • Requests per minute: 500 (default tier)
    • Tokens per minute: 200,000
    • Context window: 128,000 tokens

Token Management

  • Input Tokens: Conversation history + system prompt
  • Output Tokens: Generated response length
  • Total Context: Must stay within model limits
  • Automatic Truncation: Old messages removed when limit approached

Knowledge Base Processing

Upload Limits

  • File Size: Maximum 1MB per file
  • Storage per Agent: 400KB default allocation
  • Processing Rate: 10 chunks per batch
  • Embedding Generation: Rate limited to prevent overload

Search Performance

  • Query Rate: No hard limit
  • Response Time: Less than 100ms for vector search
  • Concurrent Searches: Multiple searches supported
  • Cache Duration: 5 minutes for repeated queries

Best Practices for Rate Limit Management

Telegram Bots

  1. Message Buffering: Add 1-3 second delays between responses
  2. Queue Implementation: Buffer messages during high traffic
  3. User Feedback: Inform users about rate limiting
  4. Group Optimization: Use @mention triggers in groups

Website Widgets

  1. Debouncing: Implement input debouncing
  2. Caching: Cache frequent responses
  3. Progressive Loading: Load resources as needed
  4. Connection Management: Reuse WebSocket connections

General Guidelines

  1. Monitor Usage: Track API usage regularly
  2. Implement Retries: Use exponential backoff
  3. User Communication: Clear error messages
  4. Graceful Degradation: Maintain basic functionality

Rate Limit Response Strategies

When Rate Limited

  • Telegram: Automatically queue messages and retry
  • Website: Show typing indicator and wait
  • Knowledge Base: Use cached results when available
  • OpenAI: Switch to simpler prompts or cached responses

Prevention Strategies

  • Request Batching: Combine multiple operations
  • Intelligent Caching: Cache common queries
  • Load Distribution: Spread requests over time
  • Priority Queuing: Prioritize important messages

Monitoring and Alerts

Metrics to Track

  • Request Rate: Messages per minute/hour
  • Error Rate: Failed requests percentage
  • Response Time: Average processing time
  • Queue Length: Pending messages count

Alert Thresholds

  • High Error Rate: Greater than 5% failed requests
  • Long Queue: More than 10 messages pending
  • Slow Response: Greater than 5 seconds average
  • Rate Limit Hit: Any 429 errors

Cost Optimization

Reduce API Costs

  • Efficient Prompts: Shorter, clearer prompts
  • Context Management: Remove unnecessary history
  • Caching Strategy: Cache frequent responses
  • Model Selection: Use appropriate model for task

Resource Management

  • Connection Pooling: Reuse connections
  • Batch Processing: Process multiple items together
  • Lazy Loading: Load resources only when needed
  • Cleanup Policies: Remove old data regularly