API Reference

Rate Limits & Usage Limits

Learn about API rate limits, usage quotas, and how to optimize your application

Overview

ParrotRouter implements rate limiting to ensure fair usage and maintain service quality for all users. Understanding these limits helps you build reliable applications that scale effectively.

Rate Limits

Requests per minute (RPM) and tokens per minute (TPM)

Usage Quotas

Monthly token limits based on your plan

Burst Capacity

Short-term burst allowance for traffic spikes

Rate Limit Headers

Every API response includes headers that help you track your rate limit status:

Response Headershttp
HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 58
X-RateLimit-Reset: 1699000000
X-RateLimit-Reset-After: 45
X-RateLimit-Bucket: default
X-RateLimit-Policy: 60;w=60
X-RateLimit-Limit

Maximum number of requests allowed in the current window

X-RateLimit-Remaining

Number of requests remaining in the current window

X-RateLimit-Reset

Unix timestamp when the rate limit window resets

X-RateLimit-Reset-After

Seconds until the rate limit window resets

Rate Limits by Plan

Free Plan Limits

Requests per minute
20 RPM
Tokens per minute
40,000 TPM
Requests per day
500 RPD
Monthly token quota
1M tokens

Model-Specific Limits

Some models have additional constraints beyond your plan limits:

Context Length Limits

GPT-4
8,192 tokens
GPT-4-32k
32,768 tokens
Claude 3 Opus
200,000 tokens
GPT-3.5-turbo
16,385 tokens

Max Output Tokens

Maximum tokens that can be generated in a single request:

Most models
4,096 tokens
Claude 3 models
4,096 tokens
GPT-4 Vision
4,096 tokens

Handling Rate Limits

Python - Exponential Backoffpython
import time
import random
from typing import Callable, Any

def exponential_backoff(
    func: Callable,
    max_retries: int = 5,
    base_delay: float = 1.0,
    max_delay: float = 60.0
) -> Any:
    """Retry function with exponential backoff"""
    
    for attempt in range(max_retries):
        try:
            return func()
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            
            # Use Retry-After header if available
            retry_after = e.response.headers.get('Retry-After')
            if retry_after:
                delay = int(retry_after)
            else:
                # Exponential backoff with jitter
                delay = min(base_delay * (2 ** attempt) + random.random(), max_delay)
            
            print(f"Rate limited. Retrying in {delay} seconds...")
            time.sleep(delay)
    
    raise Exception("Max retries exceeded")
TypeScript - Rate Limit Managertypescript
class RateLimitManager {
  private requests: number[] = [];
  private readonly windowMs = 60000; // 1 minute
  private readonly maxRequests = 60;

  async checkRateLimit(): Promise<void> {
    const now = Date.now();
    
    // Remove old requests outside the window
    this.requests = this.requests.filter(
      timestamp => now - timestamp < this.windowMs
    );
    
    if (this.requests.length >= this.maxRequests) {
      const oldestRequest = this.requests[0];
      const resetTime = oldestRequest + this.windowMs;
      const waitTime = resetTime - now;
      
      throw new Error(`Rate limit exceeded. Wait ${waitTime}ms`);
    }
    
    this.requests.push(now);
  }

  async makeRequest<T>(fn: () => Promise<T>): Promise<T> {
    await this.checkRateLimit();
    
    try {
      const response = await fn();
      
      // Update limits from response headers
      this.updateFromHeaders(response.headers);
      
      return response;
    } catch (error) {
      if (error.status === 429) {
        const retryAfter = error.headers?.['retry-after'] || 60;
        await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
        return this.makeRequest(fn);
      }
      throw error;
    }
  }

  private updateFromHeaders(headers: Headers) {
    const remaining = headers.get('X-RateLimit-Remaining');
    const reset = headers.get('X-RateLimit-Reset');
    
    // Update internal state based on server response
    if (remaining && reset) {
      // Sync with server limits
    }
  }
}

Best Practices

Implement Request Queuing

Queue requests to stay within rate limits automatically

from queue import Queue
import threading

class RequestQueue:
    def __init__(self, rpm_limit=60):
        self.queue = Queue()
        self.rpm_limit = rpm_limit
        self.interval = 60.0 / rpm_limit
        
    def process_queue(self):
        while True:
            request = self.queue.get()
            # Process request
            time.sleep(self.interval)

Use Batch Requests

Combine multiple operations into single requests when possible

# Instead of multiple requests
for prompt in prompts:
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )

# Use batch processing
responses = client.chat.completions.create(
    model="gpt-4",
    messages=batch_messages,
    n=len(prompts)
)

Monitor Usage

Track your API usage to avoid hitting limits

class UsageTracker {
  private tokenCount = 0;
  private requestCount = 0;
  
  trackUsage(response: any) {
    this.tokenCount += response.usage.total_tokens;
    this.requestCount += 1;
    
    // Alert if approaching limits
    if (this.tokenCount > 900000) {
      console.warn('Approaching token limit');
    }
  }
}

Cache Responses

Cache common requests to reduce API calls

from functools import lru_cache
import hashlib

@lru_cache(maxsize=1000)
def cached_completion(prompt_hash):
    # Only called if not in cache
    return client.chat.completions.create(...)

def get_completion(prompt):
    prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
    return cached_completion(prompt_hash)

Rate Limit Errors

When you exceed rate limits, you'll receive a 429 error:

Rate Limit Error Responsejson
{
  "error": {
    "message": "Rate limit exceeded. Please retry after 60 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "status": 429
  }
}

Increasing Your Limits

Upgrade Your Plan

Higher tier plans come with increased rate limits and quotas

View pricing plans →

Request Limit Increase

Enterprise customers can request custom rate limits based on their needs

Contact enterprise sales →

Optimize Usage

Implement caching, batching, and efficient prompting to maximize your current limits

  • • Use smaller models when appropriate
  • • Implement response caching
  • • Batch similar requests
  • • Optimize prompt length

Related Topics