Stream Responses
Stream text generation in real-time for better user experience.
Endpoint
POST /v1/generate/stream
Request Body
Same parameters as Generate Text, with streaming enabled.
Example Request
JavaScript
const response = await fetch('https://api.example.com/v1/generate/stream', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
prompt: 'Tell me a story',
maxTokens: 500,
temperature: 0.8
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim());
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
process.stdout.write(data.text);
}
}
}
Python
import requests
import json
response = requests.post(
'https://api.example.com/v1/generate/stream',
headers={
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
json={
'prompt': 'Tell me a story',
'maxTokens': 500,
'temperature': 0.8
},
stream=True
)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data = json.loads(line[6:])
print(data['text'], end='', flush=True)
cURL
curl -N https://api.example.com/v1/generate/stream \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Tell me a story",
"maxTokens": 500,
"temperature": 0.8
}'
Response Format
Streaming responses use Server-Sent Events (SSE) format:
data: {"id":"gen_abc123","text":"Once","delta":"Once"}
data: {"id":"gen_abc123","text":"Once upon","delta":" upon"}
data: {"id":"gen_abc123","text":"Once upon a time","delta":" a time"}
data: [DONE]
Event Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier for this generation |
text | string | Complete text generated so far |
delta | string | New text added in this chunk |
finishReason | string | Present in final event only |
Handling Stream Events
Process Each Chunk
async function handleStream(response) {
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || '';
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') {
return;
}
const parsed = JSON.parse(data);
processChunk(parsed);
}
}
}
}
function processChunk(chunk) {
// Update UI with chunk.delta
console.log(chunk.delta);
}
React Component Example
import { useState } from 'react';
function StreamingChat() {
const [text, setText] = useState('');
const [isStreaming, setIsStreaming] = useState(false);
const generateStream = async (prompt) => {
setIsStreaming(true);
setText('');
const response = await fetch('/api/generate/stream', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ prompt, maxTokens: 500 })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(l => l.trim());
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') {
setIsStreaming(false);
return;
}
const parsed = JSON.parse(data);
setText(prev => prev + parsed.delta);
}
}
}
};
return (
<div>
<pre>{text}</pre>
{isStreaming && <div>Generating...</div>}
</div>
);
}
Error Handling
Errors in streams are sent as events:
data: {"error":{"type":"rate_limit_error","message":"Rate limit exceeded"}}
Handle errors appropriately:
for (const line of lines) {
if (line.startsWith('data: ')) {
const parsed = JSON.parse(line.slice(6));
if (parsed.error) {
console.error('Stream error:', parsed.error);
break;
}
processChunk(parsed);
}
}
Best Practices
1. Update UI Incrementally
Show text as it arrives for better UX:
// Good ✅
setText(prev => prev + chunk.delta);
// Bad ❌ - waiting for complete response
setText(completeText);
2. Handle Connection Issues
Implement reconnection logic:
async function streamWithRetry(prompt, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
await handleStream(prompt);
return;
} catch (error) {
if (i === maxRetries - 1) throw error;
await new Promise(r => setTimeout(r, 1000 * (i + 1)));
}
}
}
3. Provide Cancel Functionality
Allow users to cancel streaming:
const controller = new AbortController();
const response = await fetch(url, {
signal: controller.signal,
// ... other options
});
// Later, to cancel:
controller.abort();
4. Buffer Management
Properly handle incomplete JSON:
let buffer = '';
for await (const chunk of stream) {
buffer += chunk;
const lines = buffer.split('\n');
buffer = lines.pop() || ''; // Keep incomplete line
for (const line of lines) {
// Process complete lines
}
}
Performance Tips
- Use delta field: Only process new text, not entire text
- Throttle UI updates: Update UI every 50-100ms, not for every chunk
- Handle backpressure: Don't let processing fall behind stream
- Close connections: Always close streams when done
Comparison: Streaming vs Non-Streaming
| Aspect | Streaming | Non-Streaming |
|---|---|---|
| Time to first token | ~100-500ms | 500ms - 3s |
| User experience | Interactive | Wait for complete |
| Complexity | Higher | Lower |
| Best for | Chat, long text | Short responses |
Next Steps
- Generate Text for non-streaming generation
- Code Examples for more use cases
- Best Practices for optimization tips