Skip to main content

Generate Text

Generate text completions using the generation API.

Endpoint

POST /v1/generate

Request Body

ParameterTypeRequiredDescription
promptstringYesThe input text to generate from
maxTokensintegerNoMaximum tokens to generate (default: 100)
temperaturenumberNoControls randomness, 0.0 to 2.0 (default: 1.0)
topPnumberNoNucleus sampling threshold (default: 1.0)
stopSequencesarrayNoSequences that stop generation
modelstringNoModel to use (default: latest)

Example Request

curl https://api.example.com/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Write a haiku about coding",
"maxTokens": 50,
"temperature": 0.7
}'

JavaScript

const response = await fetch('https://api.example.com/v1/generate', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
prompt: 'Write a haiku about coding',
maxTokens: 50,
temperature: 0.7
})
});

const data = await response.json();
console.log(data.text);

Python

import requests

response = requests.post(
'https://api.example.com/v1/generate',
headers={
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
json={
'prompt': 'Write a haiku about coding',
'maxTokens': 50,
'temperature': 0.7
}
)

data = response.json()
print(data['text'])

Response

{
"id": "gen_abc123",
"object": "generation",
"created": 1640000000,
"model": "model-v1",
"text": "Code flows like water\nBugs lurk in every corner\nDebug, then deploy",
"usage": {
"promptTokens": 5,
"completionTokens": 15,
"totalTokens": 20
},
"finishReason": "stop"
}

Response Fields

FieldTypeDescription
idstringUnique identifier for this generation
objectstringObject type (always "generation")
createdintegerUnix timestamp of creation
modelstringModel used for generation
textstringGenerated text
usageobjectToken usage information
finishReasonstringReason generation stopped

Finish Reasons

  • stop: Natural completion or stop sequence reached
  • length: Maximum token limit reached
  • content_filter: Content filtered due to policy

Parameters in Detail

temperature

Controls randomness in generation:

  • 0.0: Deterministic, always picks most likely token
  • 1.0: Balanced creativity and coherence (default)
  • 2.0: Maximum randomness
// More creative
{ temperature: 1.5 }

// More focused
{ temperature: 0.3 }

topP

Nucleus sampling - considers tokens with cumulative probability up to topP:

  • 1.0: Consider all tokens (default)
  • 0.9: Consider top 90% probability mass
  • 0.1: Very focused, only high-probability tokens

stopSequences

Array of strings that stop generation when encountered:

{
prompt: "List three items:\n1.",
stopSequences: ["\n\n", "4."]
}

Error Responses

Invalid Parameters

{
"error": {
"type": "invalid_request_error",
"message": "maxTokens must be between 1 and 2048",
"code": "invalid_parameter"
}
}

Rate Limit Exceeded

{
"error": {
"type": "rate_limit_error",
"message": "Rate limit exceeded. Please try again later.",
"code": "rate_limit_exceeded"
}
}

Best Practices

  1. Use temperature wisely: Lower for factual content, higher for creative content
  2. Set appropriate maxTokens: Avoid unnecessarily large values
  3. Use stopSequences: For structured output, use stop sequences to control format
  4. Handle errors: Always implement error handling and retries
  5. Cache when possible: Cache responses for identical prompts

Next Steps