Feature

Model Routing

Intelligently route requests to the optimal model for cost and performance

What is Model Routing?

Model routing automatically selects the best AI model for each request based on complexity, required capabilities, and your preferences. This ensures you get optimal results while minimizing costs.

Smart Selection

Analyzes prompt complexity to choose the right model

Cost Optimization

Uses cheaper models for simple tasks automatically

Fallback Handling

Seamlessly switches models if one is unavailable

Automatic Routing

Use our auto-router model to let ParrotRouter decide the best model for your request:

Automatic Model Selectionpython
from openai import OpenAI

client = OpenAI(
    base_url="https://api.parrotrouter.com/v1",
    api_key="your-api-key"
)

# Simple request - will use a fast, cheap model
response = client.chat.completions.create(
    model="auto",  # Let ParrotRouter choose
    messages=[
        {"role": "user", "content": "What's the capital of France?"}
    ]
)
print(f"Used model: {response.model}")  # e.g., "gpt-3.5-turbo"

# Complex request - will use a more capable model
response = client.chat.completions.create(
    model="auto",
    messages=[
        {"role": "user", "content": "Write a detailed analysis of quantum computing's impact on cryptography, including specific algorithms and timelines"}
    ]
)
print(f"Used model: {response.model}")  # e.g., "gpt-4-turbo-preview"

Routing Strategies

Cost-Optimized Routing

Minimizes costs by using the cheapest model that can handle the task.

# Configure cost-optimized routing
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Explain machine learning"}],
    extra_headers={
        "X-Routing-Strategy": "cost-optimized",
        "X-Max-Cost-Per-1K-Tokens": "0.01"  # Maximum $/1K tokens
    }
)

# Will route to models like:
# - GPT-3.5 Turbo for simple queries
# - Claude Haiku for moderate complexity
# - Only use GPT-4 when absolutely necessary

Quality-Optimized Routing

Prioritizes the best possible output quality for critical tasks.

# Configure quality-optimized routing
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Review this legal contract"}],
    extra_headers={
        "X-Routing-Strategy": "quality-optimized",
        "X-Min-Model-Tier": "premium"  # Only use top-tier models
    }
)

# Will route to models like:
# - GPT-4 Turbo
# - Claude 3 Opus
# - Gemini Ultra

Speed-Optimized Routing

Minimizes latency by routing to the fastest available models.

# Configure speed-optimized routing
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Generate a product name"}],
    extra_headers={
        "X-Routing-Strategy": "speed-optimized",
        "X-Max-Latency-Ms": "1000"  # Maximum 1 second response time
    }
)

# Will route to models like:
# - GPT-3.5 Turbo (fastest)
# - Claude Instant
# - Local/edge models when available

Custom Routing Rules

Create custom routing rules based on your specific requirements:

Advanced Routing Configurationpython
# Define custom routing rules
routing_config = {
    "rules": [
        {
            "name": "code-generation",
            "condition": {
                "prompt_contains": ["code", "function", "implement", "debug"],
                "or_system_contains": ["programmer", "developer"]
            },
            "models": ["claude-3-opus", "gpt-4-turbo-preview"],
            "fallback": "gpt-3.5-turbo"
        },
        {
            "name": "creative-writing",
            "condition": {
                "prompt_contains": ["story", "creative", "poem", "write"],
                "min_length": 100
            },
            "models": ["claude-3-opus", "gpt-4"],
            "temperature_override": 0.8
        },
        {
            "name": "quick-answers",
            "condition": {
                "max_tokens": 100,
                "max_prompt_length": 200
            },
            "models": ["gpt-3.5-turbo", "claude-instant"],
            "cache_enabled": true
        }
    ],
    "default_model": "gpt-3.5-turbo"
}

# Apply routing configuration
response = client.chat.completions.create(
    model="auto",
    messages=[{
        "role": "system", 
        "content": "You are a helpful programmer"
    }, {
        "role": "user", 
        "content": "Debug this Python function"
    }],
    extra_headers={
        "X-Routing-Config": json.dumps(routing_config)
    }
)
# Will use claude-3-opus or gpt-4-turbo-preview based on availability

Model Groups

Route to groups of models with similar capabilities:

Capability-Based Groups

# Route to vision-capable models
response = client.chat.completions.create(
    model="auto:vision",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "..."}}
        ]
    }]
)

# Route to function-calling models
response = client.chat.completions.create(
    model="auto:functions",
    messages=[{"role": "user", "content": "Get weather"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {...}
        }
    }]
)

Performance-Based Groups

# Route to fast models
response = client.chat.completions.create(
    model="auto:fast",
    messages=[{"role": "user", "content": "Quick question"}]
)

# Route to high-quality models
response = client.chat.completions.create(
    model="auto:premium",
    messages=[{"role": "user", "content": "Complex analysis"}]
)

# Route to cost-effective models
response = client.chat.completions.create(
    model="auto:budget",
    messages=[{"role": "user", "content": "Simple task"}]
)

Fallback Chains

Define fallback sequences for high availability:

Fallback Configurationpython
# Define fallback chain
response = client.chat.completions.create(
    model="gpt-4-turbo-preview",
    messages=[{"role": "user", "content": "Important request"}],
    extra_headers={
        "X-Fallback-Models": "claude-3-opus,gpt-4,gpt-3.5-turbo",
        "X-Fallback-Strategy": "sequential",  # or "random", "load-balanced"
        "X-Retry-On-Error": "true",
        "X-Max-Retries": "3"
    }
)

# Advanced fallback with conditions
fallback_config = {
    "primary": "gpt-4-turbo-preview",
    "fallbacks": [
        {
            "model": "claude-3-opus",
            "condition": "rate_limit or timeout"
        },
        {
            "model": "gpt-4",
            "condition": "server_error",
            "max_additional_cost": 0.02
        },
        {
            "model": "gpt-3.5-turbo",
            "condition": "any_error",
            "reduce_max_tokens": true
        }
    ]
}

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Process this request"}],
    extra_headers={
        "X-Fallback-Config": json.dumps(fallback_config)
    }
)

Routing Analytics

Monitor and optimize your routing decisions:

Get Routing Analyticspython
import requests

# Get routing statistics
response = requests.get(
    "https://api.parrotrouter.com/v1/analytics/routing",
    headers={"Authorization": "Bearer your-api-key"},
    params={
        "start_date": "2024-01-01",
        "end_date": "2024-01-31",
        "group_by": "model"
    }
)

analytics = response.json()
print(f"Total requests: {analytics['total_requests']}")
print(f"Auto-routed: {analytics['auto_routed_percentage']}%")
print(f"Average cost savings: $" + str(analytics['cost_savings']))

# Model distribution
for model in analytics['model_distribution']:
    print(f"{model['name']}: {model['percentage']}% ({model['count']} requests)")

# Routing decisions
for decision in analytics['routing_decisions']:
    print(f"Rule: {decision['rule_name']}")
    print(f"Triggered: {decision['count']} times")
    print(f"Success rate: {decision['success_rate']}%")

Best Practices

  • 1.
    Start with Auto-Routing

    Let ParrotRouter optimize for you before creating custom rules

  • 2.
    Monitor Performance

    Track which models are being selected and why

  • 3.
    Set Cost Limits

    Define maximum costs to prevent unexpected charges

  • 4.
    Test Fallback Chains

    Ensure your fallback models can handle your prompts

Related Features