Chat Completions
Endpoint
POST https://llm.siraya.pro/v1/chat/completions
Basic chat completion
Create a non-streaming chat completion.
Example request
import os
from openai import OpenAI
client = OpenAI(
api_key='<API_KEY>',
base_url='https://llm.siraya.pro/v1'
)
completion = client.chat.completions.create(
model='claude-3-5-sonnet@20240620',
messages=[
{
'role': 'user',
'content': 'What is the meaning of life?'
}
],
stream=False,
)
print('Assistant:', completion.choices[0].message.content)
print('Tokens used:', completion.usage)
import OpenAI from 'openai';
const openai = new OpenAI({
'<API_KEY>',
baseURL: 'https://llm.siraya.pro/v1',
});
const completion = await openai.chat.completions.create({
model: 'claude-3-5-sonnet@20240620',
messages: [
{
role: 'user',
content: 'What is the meaning of life?',
},
],
stream: false,
});
console.log('Assistant:', completion.choices[0].message.content);
console.log('Tokens used:', completion.usage);
Streaming chat completion
Create a streaming chat completion that streams tokens as they are generated.
Example request
import os
from openai import OpenAI
client = OpenAI(
api_key='<API_KEY>',
base_url='https://llm.siraya.pro/v1'
)
stream = client.chat.completions.create(
model='claude-3-5-sonnet@20240620',
messages=[
{
'role': 'user',
'content': 'What is the meaning of life?'
}
],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end='', flush=True)
import OpenAI from 'openai';
const openai = new OpenAI({
'<API_KEY>',
baseURL: 'https://llm.siraya.pro/v1',
});
const stream = await openai.chat.completions.create({
model: 'claude-3-5-sonnet@20240620',
messages: [
{
role: 'user',
content: 'What is the meaning of life?',
},
],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
}
}
Streaming response format
Streaming responses are sent as Server-Sent Events (SSE), a web standard for real-time data streaming over HTTP. Each event contains a JSON object with the partial response data.
The response format follows the OpenAI streaming specification:
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"anthropic/claude-sonnet-4.5","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"anthropic/claude-sonnet-4.5","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]}
data: [DONE]
Key characteristics:
- Each line starts with
data:followed by JSON - Content is delivered incrementally in the
delta.contentfield - The stream ends with
data: [DONE] - Empty lines separate events
SSE Parsing Libraries:
If you're building custom SSE parsing (instead of using the OpenAI SDK), these libraries can help:
- JavaScript/TypeScript:
eventsource-parser- Robust SSE parsing with support for partial events - Python:
httpx-sse- SSE support for HTTPX, orsseclient-pyfor requests
For more details about the SSE specification, see the W3C specification.
Image attachments
Send images as part of your chat completion request.
PDF attachments
Send PDF documents as part of your chat completion request.
Audio attachments
Video attachments
Parameters
The chat completions endpoint supports the following parameters:
Required parameters
model(string): The model to use for the completion (e.g.,anthropic/claude-sonnet-4)messages(array): Array of message objects withroleandcontentfields
Optional parameters
stream(boolean): Whether to stream the response. Defaults tofalsetemperature(number): Controls randomness in the output. Range: 0-2max_tokens(integer): Maximum number of tokens to generatetop_p(number): Nucleus sampling parameter. Range: 0-1frequency_penalty(number): Penalty for frequent tokens. Range: -2 to 2presence_penalty(number): Penalty for present tokens. Range: -2 to 2stop(string or array): Stop sequences for the generationtools(array): Array of tool definitions for function callingtool_choice(string or object): Controls which tools are called (auto,none, or specific function)provider(object): Provider routing and configuration optionsresponse_format(object): Controls the format of the model's response- For OpenAI standard format:
{ type: "json_schema", json_schema: { name, schema, strict?, description? } } - For legacy format:
{ type: "json", schema?, name?, description? } - For plain text:
{ type: "text" } - See Structured outputs for detailed examples
Message format
Messages support different content types:
Text messages
Multimodal messages
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..."
}
}
]
}
File messages