Billing
January 15, 202422 min read

Resolve Quota & Billing Errors

Quota and billing errors occur when you exceed usage limits or have payment issues. This guide helps you monitor usage, manage costs, and implement strategies to stay within limits.

Understanding Quota & Billing Errors

Each provider has different quota systems. Check usage at: OpenAI Usage, Anthropic Console, and your respective provider dashboards.

Quota Exceeded
"You exceeded your current quota, please check your plan and billing details"
Payment Required
"Your account has insufficient funds or payment method issues"

Common Causes

Free Tier Limits
  • • OpenAI: $5 free credit expires after 3 months
  • • Anthropic: Limited free tier for testing
  • • Google: $300 credit for new users
Usage Spikes
  • • Infinite loops in code
  • • Missing rate limiting
  • • Unexpected user traffic
  • • Development testing without limits
Payment Issues
  • • Expired credit card
  • • Insufficient funds
  • • Failed payment processing
  • • Regional payment restrictions

Solutions

Solution 1

Implement Usage Monitoring

import { ParrotRouter } from 'parrotrouter-sdk';

class UsageMonitor {
  constructor(apiKey, limits) {
    this.client = new ParrotRouter(apiKey);
    this.limits = limits;
    this.usage = { requests: 0, tokens: 0, cost: 0 };
  }

  async makeRequest(prompt, model) {
    // Check limits before making request
    if (this.usage.requests >= this.limits.maxRequests) {
      throw new Error('Daily request limit reached');
    }
    
    if (this.usage.cost >= this.limits.maxCost) {
      throw new Error('Daily cost limit reached');
    }

    try {
      const response = await this.client.chat.completions.create({
        model: model,
        messages: [{ role: 'user', content: prompt }],
        max_tokens: 500
      });

      // Update usage tracking
      this.usage.requests++;
      this.usage.tokens += response.usage.total_tokens;
      this.usage.cost += this.calculateCost(model, response.usage);

      // Send alert if approaching limits
      if (this.usage.cost > this.limits.maxCost * 0.8) {
        this.sendAlert('Approaching 80% of daily cost limit');
      }

      return response;
    } catch (error) {
      if (error.status === 429) {
        console.error('Quota exceeded:', error.message);
        this.handleQuotaExceeded();
      }
      throw error;
    }
  }

  calculateCost(model, usage) {
    const pricing = {
      'gpt-4': { input: 0.03, output: 0.06 },
      'gpt-3.5-turbo': { input: 0.0005, output: 0.0015 },
      'claude-3-opus': { input: 0.015, output: 0.075 }
    };

    const modelPricing = pricing[model] || pricing['gpt-3.5-turbo'];
    return (usage.prompt_tokens * modelPricing.input + 
            usage.completion_tokens * modelPricing.output) / 1000;
  }

  handleQuotaExceeded() {
    // Implement fallback strategy
    console.log('Switching to backup provider or cheaper model');
    // Notify administrators
    this.sendAlert('Quota exceeded - switching to fallback');
  }

  sendAlert(message) {
    // Implement your alert mechanism
    console.warn('[USAGE ALERT]:', message);
  }
}

// Usage
const monitor = new UsageMonitor('your-api-key', {
  maxRequests: 1000,
  maxCost: 50.00
});
Solution 2

Set Up Budget Alerts

OpenAI Budget Alerts
  1. 1. Go to OpenAI Billing
  2. 2. Set "Usage limits" for monthly budget
  3. 3. Enable email notifications at 50%, 75%, 100%
Anthropic Budget Management
  1. 1. Access Anthropic Console
  2. 2. Navigate to Billing → Spending Limits
  3. 3. Set daily and monthly caps
Solution 3

Implement Client-Side Rate Limiting

import { RateLimiter } from 'limiter';

class APIRateLimiter {
  constructor() {
    // Create different rate limiters for different tiers
    this.limiters = {
      free: new RateLimiter({
        tokensPerInterval: 3,
        interval: 'minute',
        fireImmediately: true
      }),
      paid: new RateLimiter({
        tokensPerInterval: 60,
        interval: 'minute',
        fireImmediately: true
      }),
      enterprise: new RateLimiter({
        tokensPerInterval: 3000,
        interval: 'minute',
        fireImmediately: true
      })
    };
  }

  async throttledRequest(tier, requestFn) {
    const limiter = this.limiters[tier] || this.limiters.free;
    
    // Wait for rate limit token
    await limiter.removeTokens(1);
    
    try {
      return await requestFn();
    } catch (error) {
      if (error.status === 429) {
        // Extract retry-after header
        const retryAfter = error.headers?.['retry-after'] || 60;
        console.log(`Rate limited. Retrying after ${retryAfter} seconds`);
        
        // Wait and retry
        await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
        return await this.throttledRequest(tier, requestFn);
      }
      throw error;
    }
  }
}

// Usage with token bucket algorithm
class TokenBucket {
  constructor(capacity, refillRate) {
    this.capacity = capacity;
    this.tokens = capacity;
    this.refillRate = refillRate; // tokens per second
    this.lastRefill = Date.now();
  }

  async getToken() {
    // Refill tokens based on time passed
    const now = Date.now();
    const timePassed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(
      this.capacity,
      this.tokens + timePassed * this.refillRate
    );
    this.lastRefill = now;

    if (this.tokens < 1) {
      // Calculate wait time
      const waitTime = (1 - this.tokens) / this.refillRate * 1000;
      await new Promise(resolve => setTimeout(resolve, waitTime));
      return this.getToken();
    }

    this.tokens -= 1;
    return true;
  }
}

// Initialize with 10 requests per minute capacity
const bucket = new TokenBucket(10, 10/60);

async function rateLimitedAPICall(prompt) {
  await bucket.getToken();
  return makeAPICall(prompt);
}

Prevention Strategies

Cost Optimization
  • ✅ Use cheaper models for simple tasks (GPT-3.5 vs GPT-4)
  • ✅ Implement caching for repeated queries
  • ✅ Batch requests when possible
  • ✅ Optimize prompt length
  • ✅ Use streaming for real-time feedback
Usage Tracking Dashboard
// Simple usage tracking dashboard
class UsageDashboard {
  constructor() {
    this.metrics = {
      daily: [],
      weekly: [],
      monthly: []
    };
  }

  trackUsage(model, tokens, cost) {
    const entry = {
      timestamp: new Date(),
      model,
      tokens,
      cost
    };

    this.metrics.daily.push(entry);
    this.updateDashboard();
  }

  updateDashboard() {
    const today = this.metrics.daily.filter(
      e => this.isToday(e.timestamp)
    );

    const stats = {
      requests: today.length,
      tokens: today.reduce((sum, e) => sum + e.tokens, 0),
      cost: today.reduce((sum, e) => sum + e.cost, 0)
    };

    console.log('📊 Daily Usage:', stats);
    
    // Send to monitoring service
    this.sendToMonitoring(stats);
  }

  sendToMonitoring(stats) {
    // Integrate with your monitoring service
    // Examples: Datadog, Grafana, CloudWatch
  }
}

Provider Rate Limits & Quotas

ProviderFree TierPaid TierEnterprise
OpenAI

3 RPM (GPT-4)

200 RPD

$5 credit

500 RPM

10,000 RPD

Pay as you go

Custom limits

Priority access

Volume discounts

Anthropic

5 RPM

300K tokens/month

Limited trial

50 RPM

5M tokens/month

Usage-based

Custom limits

Dedicated support

SLA guarantees

Google

60 RPM

$300 credit

90 days

1000 RPM

Unlimited

Per-token pricing

Custom quotas

Committed use

Discounts available

Monitoring & Management Tools

Recommended Monitoring Tools
  • Helicone - LLM observability platform

    Track costs, latency, and usage across providers

  • Langfuse - Open source LLM monitoring

    Debug prompts and track token usage

  • Datadog - Full-stack monitoring

    Custom metrics and alerting for LLM usage

Best Practices

Development Environment
  • • Use separate API keys for dev/staging/prod
  • • Implement mock responses for testing
  • • Set strict limits for development keys
  • • Use cheaper models during development
Production Environment
  • • Implement circuit breakers
  • • Use exponential backoff for retries
  • • Cache responses when possible
  • • Monitor cost per user/feature
Rate Limiting Guide

Implement proper rate limiting to prevent quota issues.

Learn more →
Cost Calculator

Estimate and compare costs across different providers.

Learn more →
Token Optimization

Reduce costs without sacrificing quality.

Learn more →

References

  1. [1] OpenAI. "Error Codes Reference" (2024)
  2. [2] Anthropic. "API Errors" (2024)
  3. [3] Stack Overflow. "OpenAI API Questions" (2024)