Quick Fix
Understanding Quota & Billing Errors
Each provider has different quota systems. Check usage at: OpenAI Usage, Anthropic Console, and your respective provider dashboards.
"You exceeded your current quota, please check your plan and billing details"
"Your account has insufficient funds or payment method issues"
Common Causes
- • OpenAI: $5 free credit expires after 3 months
- • Anthropic: Limited free tier for testing
- • Google: $300 credit for new users
- • Infinite loops in code
- • Missing rate limiting
- • Unexpected user traffic
- • Development testing without limits
- • Expired credit card
- • Insufficient funds
- • Failed payment processing
- • Regional payment restrictions
Solutions
Implement Usage Monitoring
import { ParrotRouter } from 'parrotrouter-sdk'; class UsageMonitor { constructor(apiKey, limits) { this.client = new ParrotRouter(apiKey); this.limits = limits; this.usage = { requests: 0, tokens: 0, cost: 0 }; } async makeRequest(prompt, model) { // Check limits before making request if (this.usage.requests >= this.limits.maxRequests) { throw new Error('Daily request limit reached'); } if (this.usage.cost >= this.limits.maxCost) { throw new Error('Daily cost limit reached'); } try { const response = await this.client.chat.completions.create({ model: model, messages: [{ role: 'user', content: prompt }], max_tokens: 500 }); // Update usage tracking this.usage.requests++; this.usage.tokens += response.usage.total_tokens; this.usage.cost += this.calculateCost(model, response.usage); // Send alert if approaching limits if (this.usage.cost > this.limits.maxCost * 0.8) { this.sendAlert('Approaching 80% of daily cost limit'); } return response; } catch (error) { if (error.status === 429) { console.error('Quota exceeded:', error.message); this.handleQuotaExceeded(); } throw error; } } calculateCost(model, usage) { const pricing = { 'gpt-4': { input: 0.03, output: 0.06 }, 'gpt-3.5-turbo': { input: 0.0005, output: 0.0015 }, 'claude-3-opus': { input: 0.015, output: 0.075 } }; const modelPricing = pricing[model] || pricing['gpt-3.5-turbo']; return (usage.prompt_tokens * modelPricing.input + usage.completion_tokens * modelPricing.output) / 1000; } handleQuotaExceeded() { // Implement fallback strategy console.log('Switching to backup provider or cheaper model'); // Notify administrators this.sendAlert('Quota exceeded - switching to fallback'); } sendAlert(message) { // Implement your alert mechanism console.warn('[USAGE ALERT]:', message); } } // Usage const monitor = new UsageMonitor('your-api-key', { maxRequests: 1000, maxCost: 50.00 });
Set Up Budget Alerts
- 1. Go to OpenAI Billing
- 2. Set "Usage limits" for monthly budget
- 3. Enable email notifications at 50%, 75%, 100%
- 1. Access Anthropic Console
- 2. Navigate to Billing → Spending Limits
- 3. Set daily and monthly caps
Implement Client-Side Rate Limiting
import { RateLimiter } from 'limiter'; class APIRateLimiter { constructor() { // Create different rate limiters for different tiers this.limiters = { free: new RateLimiter({ tokensPerInterval: 3, interval: 'minute', fireImmediately: true }), paid: new RateLimiter({ tokensPerInterval: 60, interval: 'minute', fireImmediately: true }), enterprise: new RateLimiter({ tokensPerInterval: 3000, interval: 'minute', fireImmediately: true }) }; } async throttledRequest(tier, requestFn) { const limiter = this.limiters[tier] || this.limiters.free; // Wait for rate limit token await limiter.removeTokens(1); try { return await requestFn(); } catch (error) { if (error.status === 429) { // Extract retry-after header const retryAfter = error.headers?.['retry-after'] || 60; console.log(`Rate limited. Retrying after ${retryAfter} seconds`); // Wait and retry await new Promise(resolve => setTimeout(resolve, retryAfter * 1000)); return await this.throttledRequest(tier, requestFn); } throw error; } } } // Usage with token bucket algorithm class TokenBucket { constructor(capacity, refillRate) { this.capacity = capacity; this.tokens = capacity; this.refillRate = refillRate; // tokens per second this.lastRefill = Date.now(); } async getToken() { // Refill tokens based on time passed const now = Date.now(); const timePassed = (now - this.lastRefill) / 1000; this.tokens = Math.min( this.capacity, this.tokens + timePassed * this.refillRate ); this.lastRefill = now; if (this.tokens < 1) { // Calculate wait time const waitTime = (1 - this.tokens) / this.refillRate * 1000; await new Promise(resolve => setTimeout(resolve, waitTime)); return this.getToken(); } this.tokens -= 1; return true; } } // Initialize with 10 requests per minute capacity const bucket = new TokenBucket(10, 10/60); async function rateLimitedAPICall(prompt) { await bucket.getToken(); return makeAPICall(prompt); }
Prevention Strategies
- ✅ Use cheaper models for simple tasks (GPT-3.5 vs GPT-4)
- ✅ Implement caching for repeated queries
- ✅ Batch requests when possible
- ✅ Optimize prompt length
- ✅ Use streaming for real-time feedback
// Simple usage tracking dashboard class UsageDashboard { constructor() { this.metrics = { daily: [], weekly: [], monthly: [] }; } trackUsage(model, tokens, cost) { const entry = { timestamp: new Date(), model, tokens, cost }; this.metrics.daily.push(entry); this.updateDashboard(); } updateDashboard() { const today = this.metrics.daily.filter( e => this.isToday(e.timestamp) ); const stats = { requests: today.length, tokens: today.reduce((sum, e) => sum + e.tokens, 0), cost: today.reduce((sum, e) => sum + e.cost, 0) }; console.log('📊 Daily Usage:', stats); // Send to monitoring service this.sendToMonitoring(stats); } sendToMonitoring(stats) { // Integrate with your monitoring service // Examples: Datadog, Grafana, CloudWatch } }
Provider Rate Limits & Quotas
Provider | Free Tier | Paid Tier | Enterprise |
---|---|---|---|
OpenAI | 3 RPM (GPT-4) 200 RPD $5 credit | 500 RPM 10,000 RPD Pay as you go | Custom limits Priority access Volume discounts |
Anthropic | 5 RPM 300K tokens/month Limited trial | 50 RPM 5M tokens/month Usage-based | Custom limits Dedicated support SLA guarantees |
60 RPM $300 credit 90 days | 1000 RPM Unlimited Per-token pricing | Custom quotas Committed use Discounts available |
ParrotRouter Advantage
Monitoring & Management Tools
- Helicone - LLM observability platform
Track costs, latency, and usage across providers
- Langfuse - Open source LLM monitoring
Debug prompts and track token usage
- Datadog - Full-stack monitoring
Custom metrics and alerting for LLM usage
Best Practices
- • Use separate API keys for dev/staging/prod
- • Implement mock responses for testing
- • Set strict limits for development keys
- • Use cheaper models during development
- • Implement circuit breakers
- • Use exponential backoff for retries
- • Cache responses when possible
- • Monitor cost per user/feature
Related Resources
Implement proper rate limiting to prevent quota issues.
Learn more →Estimate and compare costs across different providers.
Learn more →Reduce costs without sacrificing quality.
Learn more →References
- [1] OpenAI. "Error Codes Reference" (2024)
- [2] Anthropic. "API Errors" (2024)
- [3] Stack Overflow. "OpenAI API Questions" (2024)