Skip to main content

Stream Responses

Stream text generation in real-time for better user experience.

Endpoint

POST /v1/generate/stream

Request Body

Same parameters as Generate Text, with streaming enabled.

Example Request

JavaScript

const response = await fetch('https://api.example.com/v1/generate/stream', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
prompt: 'Tell me a story',
maxTokens: 500,
temperature: 0.8
})
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
const { done, value } = await reader.read();
if (done) break;

const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim());

for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
process.stdout.write(data.text);
}
}
}

Python

import requests
import json

response = requests.post(
'https://api.example.com/v1/generate/stream',
headers={
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
json={
'prompt': 'Tell me a story',
'maxTokens': 500,
'temperature': 0.8
},
stream=True
)

for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data = json.loads(line[6:])
print(data['text'], end='', flush=True)

cURL

curl -N https://api.example.com/v1/generate/stream \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Tell me a story",
"maxTokens": 500,
"temperature": 0.8
}'

Response Format

Streaming responses use Server-Sent Events (SSE) format:

data: {"id":"gen_abc123","text":"Once","delta":"Once"}

data: {"id":"gen_abc123","text":"Once upon","delta":" upon"}

data: {"id":"gen_abc123","text":"Once upon a time","delta":" a time"}

data: [DONE]

Event Fields

FieldTypeDescription
idstringUnique identifier for this generation
textstringComplete text generated so far
deltastringNew text added in this chunk
finishReasonstringPresent in final event only

Handling Stream Events

Process Each Chunk

async function handleStream(response) {
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
const { done, value } = await reader.read();
if (done) break;

buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || '';

for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') {
return;
}
const parsed = JSON.parse(data);
processChunk(parsed);
}
}
}
}

function processChunk(chunk) {
// Update UI with chunk.delta
console.log(chunk.delta);
}

React Component Example

import { useState } from 'react';

function StreamingChat() {
const [text, setText] = useState('');
const [isStreaming, setIsStreaming] = useState(false);

const generateStream = async (prompt) => {
setIsStreaming(true);
setText('');

const response = await fetch('/api/generate/stream', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ prompt, maxTokens: 500 })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
const { done, value } = await reader.read();
if (done) break;

const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(l => l.trim());

for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') {
setIsStreaming(false);
return;
}
const parsed = JSON.parse(data);
setText(prev => prev + parsed.delta);
}
}
}
};

return (
<div>
<pre>{text}</pre>
{isStreaming && <div>Generating...</div>}
</div>
);
}

Error Handling

Errors in streams are sent as events:

data: {"error":{"type":"rate_limit_error","message":"Rate limit exceeded"}}

Handle errors appropriately:

for (const line of lines) {
if (line.startsWith('data: ')) {
const parsed = JSON.parse(line.slice(6));

if (parsed.error) {
console.error('Stream error:', parsed.error);
break;
}

processChunk(parsed);
}
}

Best Practices

1. Update UI Incrementally

Show text as it arrives for better UX:

// Good ✅
setText(prev => prev + chunk.delta);

// Bad ❌ - waiting for complete response
setText(completeText);

2. Handle Connection Issues

Implement reconnection logic:

async function streamWithRetry(prompt, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
await handleStream(prompt);
return;
} catch (error) {
if (i === maxRetries - 1) throw error;
await new Promise(r => setTimeout(r, 1000 * (i + 1)));
}
}
}

3. Provide Cancel Functionality

Allow users to cancel streaming:

const controller = new AbortController();

const response = await fetch(url, {
signal: controller.signal,
// ... other options
});

// Later, to cancel:
controller.abort();

4. Buffer Management

Properly handle incomplete JSON:

let buffer = '';

for await (const chunk of stream) {
buffer += chunk;
const lines = buffer.split('\n');
buffer = lines.pop() || ''; // Keep incomplete line

for (const line of lines) {
// Process complete lines
}
}

Performance Tips

  1. Use delta field: Only process new text, not entire text
  2. Throttle UI updates: Update UI every 50-100ms, not for every chunk
  3. Handle backpressure: Don't let processing fall behind stream
  4. Close connections: Always close streams when done

Comparison: Streaming vs Non-Streaming

AspectStreamingNon-Streaming
Time to first token~100-500ms500ms - 3s
User experienceInteractiveWait for complete
ComplexityHigherLower
Best forChat, long textShort responses

Next Steps