Model Routing
Intelligently route requests to the optimal model for cost and performance
What is Model Routing?
Model routing automatically selects the best AI model for each request based on complexity, required capabilities, and your preferences. This ensures you get optimal results while minimizing costs.
Smart Selection
Analyzes prompt complexity to choose the right model
Cost Optimization
Uses cheaper models for simple tasks automatically
Fallback Handling
Seamlessly switches models if one is unavailable
Automatic Routing
Use our auto-router model to let ParrotRouter decide the best model for your request:
from openai import OpenAI
client = OpenAI(
base_url="https://api.parrotrouter.com/v1",
api_key="your-api-key"
)
# Simple request - will use a fast, cheap model
response = client.chat.completions.create(
model="auto", # Let ParrotRouter choose
messages=[
{"role": "user", "content": "What's the capital of France?"}
]
)
print(f"Used model: {response.model}") # e.g., "gpt-3.5-turbo"
# Complex request - will use a more capable model
response = client.chat.completions.create(
model="auto",
messages=[
{"role": "user", "content": "Write a detailed analysis of quantum computing's impact on cryptography, including specific algorithms and timelines"}
]
)
print(f"Used model: {response.model}") # e.g., "gpt-4-turbo-preview"
Routing Strategies
Cost-Optimized Routing
Minimizes costs by using the cheapest model that can handle the task.
# Configure cost-optimized routing
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Explain machine learning"}],
extra_headers={
"X-Routing-Strategy": "cost-optimized",
"X-Max-Cost-Per-1K-Tokens": "0.01" # Maximum $/1K tokens
}
)
# Will route to models like:
# - GPT-3.5 Turbo for simple queries
# - Claude Haiku for moderate complexity
# - Only use GPT-4 when absolutely necessary
Quality-Optimized Routing
Prioritizes the best possible output quality for critical tasks.
# Configure quality-optimized routing
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Review this legal contract"}],
extra_headers={
"X-Routing-Strategy": "quality-optimized",
"X-Min-Model-Tier": "premium" # Only use top-tier models
}
)
# Will route to models like:
# - GPT-4 Turbo
# - Claude 3 Opus
# - Gemini Ultra
Speed-Optimized Routing
Minimizes latency by routing to the fastest available models.
# Configure speed-optimized routing
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Generate a product name"}],
extra_headers={
"X-Routing-Strategy": "speed-optimized",
"X-Max-Latency-Ms": "1000" # Maximum 1 second response time
}
)
# Will route to models like:
# - GPT-3.5 Turbo (fastest)
# - Claude Instant
# - Local/edge models when available
Custom Routing Rules
Create custom routing rules based on your specific requirements:
# Define custom routing rules
routing_config = {
"rules": [
{
"name": "code-generation",
"condition": {
"prompt_contains": ["code", "function", "implement", "debug"],
"or_system_contains": ["programmer", "developer"]
},
"models": ["claude-3-opus", "gpt-4-turbo-preview"],
"fallback": "gpt-3.5-turbo"
},
{
"name": "creative-writing",
"condition": {
"prompt_contains": ["story", "creative", "poem", "write"],
"min_length": 100
},
"models": ["claude-3-opus", "gpt-4"],
"temperature_override": 0.8
},
{
"name": "quick-answers",
"condition": {
"max_tokens": 100,
"max_prompt_length": 200
},
"models": ["gpt-3.5-turbo", "claude-instant"],
"cache_enabled": true
}
],
"default_model": "gpt-3.5-turbo"
}
# Apply routing configuration
response = client.chat.completions.create(
model="auto",
messages=[{
"role": "system",
"content": "You are a helpful programmer"
}, {
"role": "user",
"content": "Debug this Python function"
}],
extra_headers={
"X-Routing-Config": json.dumps(routing_config)
}
)
# Will use claude-3-opus or gpt-4-turbo-preview based on availability
Model Groups
Route to groups of models with similar capabilities:
Capability-Based Groups
# Route to vision-capable models
response = client.chat.completions.create(
model="auto:vision",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "..."}}
]
}]
)
# Route to function-calling models
response = client.chat.completions.create(
model="auto:functions",
messages=[{"role": "user", "content": "Get weather"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {...}
}
}]
)
Performance-Based Groups
# Route to fast models
response = client.chat.completions.create(
model="auto:fast",
messages=[{"role": "user", "content": "Quick question"}]
)
# Route to high-quality models
response = client.chat.completions.create(
model="auto:premium",
messages=[{"role": "user", "content": "Complex analysis"}]
)
# Route to cost-effective models
response = client.chat.completions.create(
model="auto:budget",
messages=[{"role": "user", "content": "Simple task"}]
)
Fallback Chains
Define fallback sequences for high availability:
# Define fallback chain
response = client.chat.completions.create(
model="gpt-4-turbo-preview",
messages=[{"role": "user", "content": "Important request"}],
extra_headers={
"X-Fallback-Models": "claude-3-opus,gpt-4,gpt-3.5-turbo",
"X-Fallback-Strategy": "sequential", # or "random", "load-balanced"
"X-Retry-On-Error": "true",
"X-Max-Retries": "3"
}
)
# Advanced fallback with conditions
fallback_config = {
"primary": "gpt-4-turbo-preview",
"fallbacks": [
{
"model": "claude-3-opus",
"condition": "rate_limit or timeout"
},
{
"model": "gpt-4",
"condition": "server_error",
"max_additional_cost": 0.02
},
{
"model": "gpt-3.5-turbo",
"condition": "any_error",
"reduce_max_tokens": true
}
]
}
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Process this request"}],
extra_headers={
"X-Fallback-Config": json.dumps(fallback_config)
}
)
Routing Analytics
Monitor and optimize your routing decisions:
import requests
# Get routing statistics
response = requests.get(
"https://api.parrotrouter.com/v1/analytics/routing",
headers={"Authorization": "Bearer your-api-key"},
params={
"start_date": "2024-01-01",
"end_date": "2024-01-31",
"group_by": "model"
}
)
analytics = response.json()
print(f"Total requests: {analytics['total_requests']}")
print(f"Auto-routed: {analytics['auto_routed_percentage']}%")
print(f"Average cost savings: $" + str(analytics['cost_savings']))
# Model distribution
for model in analytics['model_distribution']:
print(f"{model['name']}: {model['percentage']}% ({model['count']} requests)")
# Routing decisions
for decision in analytics['routing_decisions']:
print(f"Rule: {decision['rule_name']}")
print(f"Triggered: {decision['count']} times")
print(f"Success rate: {decision['success_rate']}%")
Best Practices
- 1.Start with Auto-Routing
Let ParrotRouter optimize for you before creating custom rules
- 2.Monitor Performance
Track which models are being selected and why
- 3.Set Cost Limits
Define maximum costs to prevent unexpected charges
- 4.Test Fallback Chains
Ensure your fallback models can handle your prompts