Quick Fix
Understanding Quota & Billing Errors
Each provider has different quota systems. Check usage at: OpenAI Usage, Anthropic Console, and your respective provider dashboards.
"You exceeded your current quota, please check your plan and billing details""Your account has insufficient funds or payment method issues"Common Causes
- • OpenAI: $5 free credit expires after 3 months
- • Anthropic: Limited free tier for testing
- • Google: $300 credit for new users
- • Infinite loops in code
- • Missing rate limiting
- • Unexpected user traffic
- • Development testing without limits
- • Expired credit card
- • Insufficient funds
- • Failed payment processing
- • Regional payment restrictions
Solutions
Implement Usage Monitoring
import { ParrotRouter } from 'parrotrouter-sdk';
class UsageMonitor {
constructor(apiKey, limits) {
this.client = new ParrotRouter(apiKey);
this.limits = limits;
this.usage = { requests: 0, tokens: 0, cost: 0 };
}
async makeRequest(prompt, model) {
// Check limits before making request
if (this.usage.requests >= this.limits.maxRequests) {
throw new Error('Daily request limit reached');
}
if (this.usage.cost >= this.limits.maxCost) {
throw new Error('Daily cost limit reached');
}
try {
const response = await this.client.chat.completions.create({
model: model,
messages: [{ role: 'user', content: prompt }],
max_tokens: 500
});
// Update usage tracking
this.usage.requests++;
this.usage.tokens += response.usage.total_tokens;
this.usage.cost += this.calculateCost(model, response.usage);
// Send alert if approaching limits
if (this.usage.cost > this.limits.maxCost * 0.8) {
this.sendAlert('Approaching 80% of daily cost limit');
}
return response;
} catch (error) {
if (error.status === 429) {
console.error('Quota exceeded:', error.message);
this.handleQuotaExceeded();
}
throw error;
}
}
calculateCost(model, usage) {
const pricing = {
'gpt-4': { input: 0.03, output: 0.06 },
'gpt-3.5-turbo': { input: 0.0005, output: 0.0015 },
'claude-3-opus': { input: 0.015, output: 0.075 }
};
const modelPricing = pricing[model] || pricing['gpt-3.5-turbo'];
return (usage.prompt_tokens * modelPricing.input +
usage.completion_tokens * modelPricing.output) / 1000;
}
handleQuotaExceeded() {
// Implement fallback strategy
console.log('Switching to backup provider or cheaper model');
// Notify administrators
this.sendAlert('Quota exceeded - switching to fallback');
}
sendAlert(message) {
// Implement your alert mechanism
console.warn('[USAGE ALERT]:', message);
}
}
// Usage
const monitor = new UsageMonitor('your-api-key', {
maxRequests: 1000,
maxCost: 50.00
});Set Up Budget Alerts
- 1. Go to OpenAI Billing
- 2. Set "Usage limits" for monthly budget
- 3. Enable email notifications at 50%, 75%, 100%
- 1. Access Anthropic Console
- 2. Navigate to Billing → Spending Limits
- 3. Set daily and monthly caps
Implement Client-Side Rate Limiting
import { RateLimiter } from 'limiter';
class APIRateLimiter {
constructor() {
// Create different rate limiters for different tiers
this.limiters = {
free: new RateLimiter({
tokensPerInterval: 3,
interval: 'minute',
fireImmediately: true
}),
paid: new RateLimiter({
tokensPerInterval: 60,
interval: 'minute',
fireImmediately: true
}),
enterprise: new RateLimiter({
tokensPerInterval: 3000,
interval: 'minute',
fireImmediately: true
})
};
}
async throttledRequest(tier, requestFn) {
const limiter = this.limiters[tier] || this.limiters.free;
// Wait for rate limit token
await limiter.removeTokens(1);
try {
return await requestFn();
} catch (error) {
if (error.status === 429) {
// Extract retry-after header
const retryAfter = error.headers?.['retry-after'] || 60;
console.log(`Rate limited. Retrying after ${retryAfter} seconds`);
// Wait and retry
await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
return await this.throttledRequest(tier, requestFn);
}
throw error;
}
}
}
// Usage with token bucket algorithm
class TokenBucket {
constructor(capacity, refillRate) {
this.capacity = capacity;
this.tokens = capacity;
this.refillRate = refillRate; // tokens per second
this.lastRefill = Date.now();
}
async getToken() {
// Refill tokens based on time passed
const now = Date.now();
const timePassed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(
this.capacity,
this.tokens + timePassed * this.refillRate
);
this.lastRefill = now;
if (this.tokens < 1) {
// Calculate wait time
const waitTime = (1 - this.tokens) / this.refillRate * 1000;
await new Promise(resolve => setTimeout(resolve, waitTime));
return this.getToken();
}
this.tokens -= 1;
return true;
}
}
// Initialize with 10 requests per minute capacity
const bucket = new TokenBucket(10, 10/60);
async function rateLimitedAPICall(prompt) {
await bucket.getToken();
return makeAPICall(prompt);
}Prevention Strategies
- ✅ Use cheaper models for simple tasks (GPT-3.5 vs GPT-4)
- ✅ Implement caching for repeated queries
- ✅ Batch requests when possible
- ✅ Optimize prompt length
- ✅ Use streaming for real-time feedback
// Simple usage tracking dashboard
class UsageDashboard {
constructor() {
this.metrics = {
daily: [],
weekly: [],
monthly: []
};
}
trackUsage(model, tokens, cost) {
const entry = {
timestamp: new Date(),
model,
tokens,
cost
};
this.metrics.daily.push(entry);
this.updateDashboard();
}
updateDashboard() {
const today = this.metrics.daily.filter(
e => this.isToday(e.timestamp)
);
const stats = {
requests: today.length,
tokens: today.reduce((sum, e) => sum + e.tokens, 0),
cost: today.reduce((sum, e) => sum + e.cost, 0)
};
console.log('📊 Daily Usage:', stats);
// Send to monitoring service
this.sendToMonitoring(stats);
}
sendToMonitoring(stats) {
// Integrate with your monitoring service
// Examples: Datadog, Grafana, CloudWatch
}
}Provider Rate Limits & Quotas
| Provider | Free Tier | Paid Tier | Enterprise |
|---|---|---|---|
| OpenAI | 3 RPM (GPT-4) 200 RPD $5 credit | 500 RPM 10,000 RPD Pay as you go | Custom limits Priority access Volume discounts |
| Anthropic | 5 RPM 300K tokens/month Limited trial | 50 RPM 5M tokens/month Usage-based | Custom limits Dedicated support SLA guarantees |
60 RPM $300 credit 90 days | 1000 RPM Unlimited Per-token pricing | Custom quotas Committed use Discounts available |
ParrotRouter Advantage
Monitoring & Management Tools
- Helicone - LLM observability platform
Track costs, latency, and usage across providers
- Langfuse - Open source LLM monitoring
Debug prompts and track token usage
- Datadog - Full-stack monitoring
Custom metrics and alerting for LLM usage
Best Practices
- • Use separate API keys for dev/staging/prod
- • Implement mock responses for testing
- • Set strict limits for development keys
- • Use cheaper models during development
- • Implement circuit breakers
- • Use exponential backoff for retries
- • Cache responses when possible
- • Monitor cost per user/feature
Related Resources
Implement proper rate limiting to prevent quota issues.
Learn more →Estimate and compare costs across different providers.
Learn more →Reduce costs without sacrificing quality.
Learn more →References
- [1] OpenAI. "Error Codes Reference" (2024)
- [2] Anthropic. "API Errors" (2024)
- [3] Stack Overflow. "OpenAI API Questions" (2024)