Streaming Response Implementation

Deliver LLM responses in real-time with streaming. Reduce perceived latency by 90% and improve user experience with progressive response rendering.

Server-Sent Events Implementation
// Client-side streaming handler
const streamCompletion = async (prompt) => {
  const response = await fetch('https://api.parrotrouter.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'Bearer YOUR_API_KEY'
    },
    body: JSON.stringify({
      model: 'gpt-4',
      messages: [{ role: 'user', content: prompt }],
      stream: true
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    
    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');
    
    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = JSON.parse(line.slice(6));
        if (data.choices[0].delta?.content) {
          updateUI(data.choices[0].delta.content);
        }
      }
    }
  }
}
First Token
50ms

Time to first response

Perceived Speed
90%

Faster perception

User Satisfaction
85%

Improvement score

References
  1. [1] arXiv. "Efficient LLM Inference" (2024)
  2. [2] Hugging Face. "GPU Inference Optimization" (2024)
  3. [3] NVIDIA. "LLM Inference Optimization" (2024)