Streaming Response Implementation

Deliver LLM responses in real-time with streaming. Reduce perceived latency by 90% and improve user experience with progressive response rendering.

Server-Sent Events Implementation

// Client-side streaming handler
const streamCompletion = async (prompt) => {
  const response = await fetch('https://api.parrotrouter.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'Bearer YOUR_API_KEY'
    },
    body: JSON.stringify({
      model: 'gpt-4',
      messages: [{ role: 'user', content: prompt }],
      stream: true
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    
    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');
    
    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = JSON.parse(line.slice(6));
        if (data.choices[0].delta?.content) {
          updateUI(data.choices[0].delta.content);
        }
      }
    }
  }
}

First Token

50ms

Time to first response

Perceived Speed

90%

Faster perception

User Satisfaction

85%

Improvement score

References

[1] arXiv. "Efficient LLM Inference" (2024)
[2] Hugging Face. "GPU Inference Optimization" (2024)
[3] NVIDIA. "LLM Inference Optimization" (2024)