LLM Token Usage Calculator
Calculate token usage and costs for your LLM API calls across different models
Characters: 0 | Estimated Tokens: 0
Characters: 0 | Estimated Tokens: 0
Or Enter Token Counts Manually
Calculation Results
Tokens are the fundamental units that language models process. Understanding tokenization is crucial for accurate cost estimation and optimization. According to Winder.AI's practical guide, different models use different tokenization methods.
Tokenizer Types
Tiktoken (OpenAI)
Uses byte pair encoding (BPE) specifically designed for GPT models. Different encodings for different models:
- •
cl100k_base
: GPT-4, GPT-3.5-turbo - •
p50k_base
: Codex, text-davinci-002/003
SentencePiece
Used by models like Llama and T5. Can operate in BPE or unigram mode, often resulting in different token boundaries.
GPT-2 Tokenizer
Legacy BPE tokenizer still used by some models. Available via Hugging Face transformers library.
Token Estimation Rules
As noted in the LiteLLM documentation, when the exact tokenizer is unavailable:
- English1 token ≈ 4 characters (0.75 words)
- Code1 token ≈ 2-3 characters (more symbols)
- Non-English1 token ≈ 2-3 characters (varies by language)
Model | Input Price | Output Price | 1M Token Cost (50/50) |
---|---|---|---|
GPT-4 Turbo | $0.0100 | $0.0300 | $20.00 |
GPT-4 | $0.0300 | $0.0600 | $45.00 |
GPT-3.5 Turbo | $0.0005 | $0.0015 | $1.00 |
Claude 3 Opus | $0.0150 | $0.0750 | $45.00 |
Claude 3 Sonnet | $0.0030 | $0.0150 | $9.00 |
Claude 3 Haiku | $0.0003 | $0.0013 | $0.75 |
Gemini Pro | $0.0003 | $0.0005 | $0.38 |
Llama 3 70B | $0.0008 | $0.0008 | $0.80 |
Mistral Large | $0.0080 | $0.0240 | $16.00 |
1. Prompt Engineering
According to LangChain's token tracking guide, concise prompts can significantly reduce costs:
2. Context Management
- • Summarize long conversations instead of including full history
- • Remove unnecessary metadata and formatting
- • Use reference IDs instead of repeating full context
3. Response Control
- • Set appropriate
max_tokens
limits - • Use
stop
sequences to prevent over-generation - • Request specific formats (e.g., "Answer in 2-3 sentences")
4. Model Selection
Choose the right model for your use case. As shown in the pricing table, GPT-3.5 Turbo costs ~60x less than GPT-4 per token.
Python Implementation with Tiktoken
import tiktoken
import json
class TokenCalculator:
def __init__(self, model="gpt-3.5-turbo"):
self.model = model
self.encoding = tiktoken.encoding_for_model(model)
# Pricing per 1K tokens (update as needed)
self.pricing = {
"gpt-4": {"input": 0.03, "output": 0.06},
"gpt-3.5-turbo": {"input": 0.0005, "output": 0.0015},
"gpt-4-turbo": {"input": 0.01, "output": 0.03}
}
def count_tokens(self, text):
"""Count tokens in text"""
return len(self.encoding.encode(text))
def calculate_cost(self, prompt, completion):
"""Calculate cost for a request"""
prompt_tokens = self.count_tokens(prompt)
completion_tokens = self.count_tokens(completion)
prices = self.pricing.get(self.model, self.pricing["gpt-3.5-turbo"])
prompt_cost = (prompt_tokens / 1000) * prices["input"]
completion_cost = (completion_tokens / 1000) * prices["output"]
return {
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"total_tokens": prompt_tokens + completion_tokens,
"prompt_cost": prompt_cost,
"completion_cost": completion_cost,
"total_cost": prompt_cost + completion_cost
}
# Usage
calculator = TokenCalculator("gpt-3.5-turbo")
result = calculator.calculate_cost(
prompt="Explain quantum computing",
completion="Quantum computing uses quantum bits..."
)
print(json.dumps(result, indent=2))
Source: Adapted from OpenAI's tiktoken library
JavaScript/TypeScript Implementation
// Using js-tiktoken library
import { getEncoding } from 'js-tiktoken'
class TokenCalculator {
private encoding: any
private pricing = {
'gpt-4': { input: 0.03, output: 0.06 },
'gpt-3.5-turbo': { input: 0.0005, output: 0.0015 },
'claude-3-sonnet': { input: 0.003, output: 0.015 }
}
constructor(private model: string = 'gpt-3.5-turbo') {
// For GPT models, use cl100k_base encoding
this.encoding = getEncoding('cl100k_base')
}
countTokens(text: string): number {
return this.encoding.encode(text).length
}
calculateCost(prompt: string, completion: string) {
const promptTokens = this.countTokens(prompt)
const completionTokens = this.countTokens(completion)
const prices = this.pricing[this.model] || this.pricing['gpt-3.5-turbo']
const promptCost = (promptTokens / 1000) * prices.input
const completionCost = (completionTokens / 1000) * prices.output
return {
promptTokens,
completionTokens,
totalTokens: promptTokens + completionTokens,
promptCost: parseFloat(promptCost.toFixed(6)),
completionCost: parseFloat(completionCost.toFixed(6)),
totalCost: parseFloat((promptCost + completionCost).toFixed(6))
}
}
// Batch calculation
calculateBatchCost(requests: Array<{prompt: string, completion: string}>) {
return requests.reduce((acc, req) => {
const cost = this.calculateCost(req.prompt, req.completion)
return {
totalTokens: acc.totalTokens + cost.totalTokens,
totalCost: acc.totalCost + cost.totalCost
}
}, { totalTokens: 0, totalCost: 0 })
}
}
// Usage
const calculator = new TokenCalculator('gpt-3.5-turbo')
const result = calculator.calculateCost(
'What is machine learning?',
'Machine learning is a subset of AI...'
)
console.log(result)
According to the LiteLLM documentation, batch processing can significantly reduce overhead and improve cost efficiency.
Batch Cost Formula
Batch Optimization Strategies
Group similar prompts and process them together to reduce API call overhead.
Use GPT-3.5 Turbo for initial processing, then GPT-4 only for complex tasks.
Example: Batch Processing 100 Customer Queries
Processing 100 customer support queries:
- • Average prompt: 150 tokens
- • Average response: 300 tokens
- • Total prompt tokens: 15,000
- • Total completion tokens: 30,000
Cost Comparison:
Real-time Cost Tracking
class CostTracker:
def __init__(self):
self.daily_usage = {}
self.monthly_usage = {}
def track_request(self, model, prompt_tokens, completion_tokens, cost):
"""Track individual request costs"""
today = datetime.now().strftime("%Y-%m-%d")
month = datetime.now().strftime("%Y-%m")
# Daily tracking
if today not in self.daily_usage:
self.daily_usage[today] = {
"requests": 0,
"total_tokens": 0,
"total_cost": 0,
"by_model": {}
}
self.daily_usage[today]["requests"] += 1
self.daily_usage[today]["total_tokens"] += prompt_tokens + completion_tokens
self.daily_usage[today]["total_cost"] += cost
# Model-specific tracking
if model not in self.daily_usage[today]["by_model"]:
self.daily_usage[today]["by_model"][model] = {
"requests": 0,
"tokens": 0,
"cost": 0
}
self.daily_usage[today]["by_model"][model]["requests"] += 1
self.daily_usage[today]["by_model"][model]["tokens"] += prompt_tokens + completion_tokens
self.daily_usage[today]["by_model"][model]["cost"] += cost
def get_daily_report(self, date=None):
"""Get usage report for a specific day"""
if date is None:
date = datetime.now().strftime("%Y-%m-%d")
return self.daily_usage.get(date, {
"requests": 0,
"total_tokens": 0,
"total_cost": 0,
"by_model": {}
})
def get_cost_alerts(self, daily_limit=100, monthly_limit=3000):
"""Check if costs exceed limits"""
today = datetime.now().strftime("%Y-%m-%d")
month = datetime.now().strftime("%Y-%m")
daily_cost = self.daily_usage.get(today, {}).get("total_cost", 0)
monthly_cost = sum(day.get("total_cost", 0) for day in self.daily_usage.values()
if day.startswith(month))
alerts = []
if daily_cost > daily_limit:
alerts.append(f"Daily cost (${daily_cost:.2f}) exceeds limit (${daily_limit})")
if monthly_cost > monthly_limit:
alerts.append(f"Monthly cost (${monthly_cost:.2f}) exceeds limit (${monthly_limit})")
return alerts
Integration with Monitoring Services
As recommended in LangChain's monitoring guide, integrate with observability platforms:
Prometheus Metrics
Export token usage and costs as Prometheus metrics for Grafana dashboards
CloudWatch/Datadog
Send custom metrics to cloud monitoring services with cost alerts
Database Logging
Store detailed usage logs for historical analysis and billing
Token Calculation Tools
- OpenAI's Interactive Tokenizer - Official tool for testing OpenAI tokenization
- Tiktoken Library - Python library for OpenAI's tokenizers
- LiteLLM Token Usage Guide - Multi-provider token tracking
Cost Optimization Resources
- Calculating Token Counts: A Practical Guide - Comprehensive guide by Winder.AI
- LangChain Token Usage Tracking - Track tokens in LangChain applications
- Token Optimization Strategies (Video) - Practical tips for reducing token usage
ParrotRouter Resources
- ParrotRouter Pricing - Current pricing across all supported models
- API Parameters Guide - Control token usage with API parameters
- Token Optimization Guide - Detailed strategies for reducing costs
usage
field in response objects for accurate token counts.