Performance Monitoring Setup
Monitor LLM performance in real-time. Track latency, errors, costs, and usage patterns with comprehensive observability tools.
Metrics Tracked
50+
Performance indicators
Alert Response
<30s
Average detection time
Data Retention
90 days
Historical analysis
OpenTelemetry Integration
from opentelemetry import trace, metrics
from opentelemetry.instrumentation.requests import RequestsInstrumentor
# Initialize tracing
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)
# Create metrics
latency_histogram = meter.create_histogram(
name="llm_request_duration",
description="LLM request latency",
unit="ms"
)
error_counter = meter.create_counter(
name="llm_errors_total",
description="Total LLM errors"
)
class MonitoredLLMClient:
@tracer.start_as_current_span("llm_request")
def complete(self, prompt):
span = trace.get_current_span()
span.set_attribute("model", self.model)
span.set_attribute("prompt_tokens", len(prompt))
start_time = time.time()
try:
response = self.client.complete(prompt)
latency = (time.time() - start_time) * 1000
latency_histogram.record(latency)
return response
except Exception as e:
error_counter.add(1)
span.record_exception(e)
raiseKey Metrics to Monitor
Performance Metrics
- Request latency (P50, P95, P99)
- Tokens per second
- Time to first token
- Queue depth
Business Metrics
- Cost per request
- Error rates by model
- Usage by endpoint
- Cache hit rates
ParrotRouter provides built-in monitoring with Prometheus-compatible metrics and real-time dashboards. No additional setup required.
References
- [1] arXiv. "Efficient LLM Inference" (2024)
- [2] Hugging Face. "GPU Inference Optimization" (2024)
- [3] NVIDIA. "LLM Inference Optimization" (2024)