Platform Rate Limits
Telegram Bot API
Message Limits
- Per User: 10 messages/minute per individual user
- Per Chat: 30 messages/second to different chats
- Daily Limit: Unlimited daily messages (within rate limits)
- Group Chats: Responds to @mentions only to prevent spam
- Bot Restrictions: Cannot initiate conversations with users
Connection Limits
- Polling Rate: Continuous long polling supported
- Webhook: Alternative to polling (coming soon)
- Concurrent Connections: Multiple bot instances supported
- Auto Recovery: Automatic reconnection on failures
Error Handling
- Rate Limit Detection: Automatic detection of rate limiting
- Exponential Backoff: Progressive delays on repeated failures
- Queue Management: Message queuing during rate limit periods
- User Notifications: Alert users when rate limited
Website Widget
Connection Limits
- Concurrent Users: No hard limit (resource dependent)
- WebSocket Connections: One per active user session
- Message Frequency: No artificial rate limiting
- Session Duration: Unlimited session length
Performance Considerations
- Response Time: Dependent on AI model processing
- Queue Management: Handles high traffic gracefully
- Resource Scaling: Automatic scaling based on demand
- Error Recovery: Automatic reconnection on disconnection
OpenAI API Limits
Model-Specific Limits
- GPT-4o-mini:
- Requests per minute: 500 (default tier)
- Tokens per minute: 200,000
- Context window: 128,000 tokens
Token Management
- Input Tokens: Conversation history + system prompt
- Output Tokens: Generated response length
- Total Context: Must stay within model limits
- Automatic Truncation: Old messages removed when limit approached
Knowledge Base Processing
Upload Limits
- File Size: Maximum 1MB per file
- Storage per Agent: 400KB default allocation
- Processing Rate: 10 chunks per batch
- Embedding Generation: Rate limited to prevent overload
Search Performance
- Query Rate: No hard limit
- Response Time: Less than 100ms for vector search
- Concurrent Searches: Multiple searches supported
- Cache Duration: 5 minutes for repeated queries
Best Practices for Rate Limit Management
Telegram Bots
- Message Buffering: Add 1-3 second delays between responses
- Queue Implementation: Buffer messages during high traffic
- User Feedback: Inform users about rate limiting
- Group Optimization: Use @mention triggers in groups
Website Widgets
- Debouncing: Implement input debouncing
- Caching: Cache frequent responses
- Progressive Loading: Load resources as needed
- Connection Management: Reuse WebSocket connections
General Guidelines
- Monitor Usage: Track API usage regularly
- Implement Retries: Use exponential backoff
- User Communication: Clear error messages
- Graceful Degradation: Maintain basic functionality
Rate Limit Response Strategies
When Rate Limited
- Telegram: Automatically queue messages and retry
- Website: Show typing indicator and wait
- Knowledge Base: Use cached results when available
- OpenAI: Switch to simpler prompts or cached responses
Prevention Strategies
- Request Batching: Combine multiple operations
- Intelligent Caching: Cache common queries
- Load Distribution: Spread requests over time
- Priority Queuing: Prioritize important messages
Monitoring and Alerts
Metrics to Track
- Request Rate: Messages per minute/hour
- Error Rate: Failed requests percentage
- Response Time: Average processing time
- Queue Length: Pending messages count
Alert Thresholds
- High Error Rate: Greater than 5% failed requests
- Long Queue: More than 10 messages pending
- Slow Response: Greater than 5 seconds average
- Rate Limit Hit: Any 429 errors
Cost Optimization
Reduce API Costs
- Efficient Prompts: Shorter, clearer prompts
- Context Management: Remove unnecessary history
- Caching Strategy: Cache frequent responses
- Model Selection: Use appropriate model for task
Resource Management
- Connection Pooling: Reuse connections
- Batch Processing: Process multiple items together
- Lazy Loading: Load resources only when needed
- Cleanup Policies: Remove old data regularly