Stream Responses

Stream text generation in real-time for better user experience.

Endpoint

POST /v1/generate/stream

Request Body

Same parameters as Generate Text, with streaming enabled.

Example Request

JavaScript

const response = await fetch('https://api.example.com/v1/generate/stream', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    prompt: 'Tell me a story',
    maxTokens: 500,
    temperature: 0.8
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.trim());

  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      process.stdout.write(data.text);
    }
  }
}

Python

import requests
import json

response = requests.post(
    'https://api.example.com/v1/generate/stream',
    headers={
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json',
    },
    json={
        'prompt': 'Tell me a story',
        'maxTokens': 500,
        'temperature': 0.8
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: '):
            data = json.loads(line[6:])
            print(data['text'], end='', flush=True)

cURL

curl -N https://api.example.com/v1/generate/stream \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Tell me a story",
    "maxTokens": 500,
    "temperature": 0.8
  }'

Response Format

Streaming responses use Server-Sent Events (SSE) format:

data: {"id":"gen_abc123","text":"Once","delta":"Once"}

data: {"id":"gen_abc123","text":"Once upon","delta":" upon"}

data: {"id":"gen_abc123","text":"Once upon a time","delta":" a time"}

data: [DONE]

Event Fields

Field	Type	Description
`id`	string	Unique identifier for this generation
`text`	string	Complete text generated so far
`delta`	string	New text added in this chunk
`finishReason`	string	Present in final event only

Handling Stream Events

Process Each Chunk

async function handleStream(response) {
  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split('\n');
    buffer = lines.pop() || '';

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') {
          return;
        }
        const parsed = JSON.parse(data);
        processChunk(parsed);
      }
    }
  }
}

function processChunk(chunk) {
  // Update UI with chunk.delta
  console.log(chunk.delta);
}

React Component Example

import { useState } from 'react';

function StreamingChat() {
  const [text, setText] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  const generateStream = async (prompt) => {
    setIsStreaming(true);
    setText('');

    const response = await fetch('/api/generate/stream', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ prompt, maxTokens: 500 })
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split('\n').filter(l => l.trim());

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = line.slice(6);
          if (data === '[DONE]') {
            setIsStreaming(false);
            return;
          }
          const parsed = JSON.parse(data);
          setText(prev => prev + parsed.delta);
        }
      }
    }
  };

  return (
    <div>
      <pre>{text}</pre>
      {isStreaming && <div>Generating...</div>}
    </div>
  );
}

Error Handling

Errors in streams are sent as events:

data: {"error":{"type":"rate_limit_error","message":"Rate limit exceeded"}}

Handle errors appropriately:

for (const line of lines) {
  if (line.startsWith('data: ')) {
    const parsed = JSON.parse(line.slice(6));

    if (parsed.error) {
      console.error('Stream error:', parsed.error);
      break;
    }

    processChunk(parsed);
  }
}

Best Practices

1. Update UI Incrementally

Show text as it arrives for better UX:

// Good ✅
setText(prev => prev + chunk.delta);

// Bad ❌ - waiting for complete response
setText(completeText);

2. Handle Connection Issues

Implement reconnection logic:

async function streamWithRetry(prompt, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      await handleStream(prompt);
      return;
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await new Promise(r => setTimeout(r, 1000 * (i + 1)));
    }
  }
}

3. Provide Cancel Functionality

Allow users to cancel streaming:

const controller = new AbortController();

const response = await fetch(url, {
  signal: controller.signal,
  // ... other options
});

// Later, to cancel:
controller.abort();

4. Buffer Management

Properly handle incomplete JSON:

let buffer = '';

for await (const chunk of stream) {
  buffer += chunk;
  const lines = buffer.split('\n');
  buffer = lines.pop() || ''; // Keep incomplete line

  for (const line of lines) {
    // Process complete lines
  }
}

Performance Tips

Use delta field: Only process new text, not entire text
Throttle UI updates: Update UI every 50-100ms, not for every chunk
Handle backpressure: Don't let processing fall behind stream
Close connections: Always close streams when done

Comparison: Streaming vs Non-Streaming

Aspect	Streaming	Non-Streaming
Time to first token	~100-500ms	500ms - 3s
User experience	Interactive	Wait for complete
Complexity	Higher	Lower
Best for	Chat, long text	Short responses

Next Steps

Generate Text for non-streaming generation
Code Examples for more use cases
Best Practices for optimization tips

Endpoint​

Request Body​

Example Request​

JavaScript​

Python​

cURL​

Response Format​

Event Fields​

Handling Stream Events​

Process Each Chunk​

React Component Example​

Error Handling​

Best Practices​

1. Update UI Incrementally​

2. Handle Connection Issues​

3. Provide Cancel Functionality​

4. Buffer Management​

Performance Tips​

Comparison: Streaming vs Non-Streaming​

Next Steps​