Skip to content

Chat Completions

Endpoint

POST https://llm.siraya.pro/v1/chat/completions

Basic chat completion

Create a non-streaming chat completion.

Example request

import os
from openai import OpenAI

client = OpenAI(
    api_key='<API_KEY>',
    base_url='https://llm.siraya.pro/v1'
)

completion = client.chat.completions.create(
    model='claude-3-5-sonnet@20240620',
    messages=[
        {
            'role': 'user',
            'content': 'What is the meaning of life?'
        }
    ],
    stream=False,
)

print('Assistant:', completion.choices[0].message.content)
print('Tokens used:', completion.usage)
import OpenAI from 'openai';

const openai = new OpenAI({
  '<API_KEY>',
  baseURL: 'https://llm.siraya.pro/v1',
});

const completion = await openai.chat.completions.create({
  model: 'claude-3-5-sonnet@20240620',
  messages: [
    {
      role: 'user',
      content: 'What is the meaning of life?',
    },
  ],
  stream: false,
});

console.log('Assistant:', completion.choices[0].message.content);
console.log('Tokens used:', completion.usage);

Streaming chat completion

Create a streaming chat completion that streams tokens as they are generated.

Example request

import os
from openai import OpenAI

client = OpenAI(
    api_key='<API_KEY>',
    base_url='https://llm.siraya.pro/v1'
)

stream = client.chat.completions.create(
    model='claude-3-5-sonnet@20240620',
    messages=[
        {
            'role': 'user',
            'content': 'What is the meaning of life?'
        }
    ],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end='', flush=True)
import OpenAI from 'openai';

const openai = new OpenAI({
  '<API_KEY>',
  baseURL: 'https://llm.siraya.pro/v1',
});

const stream = await openai.chat.completions.create({
  model: 'claude-3-5-sonnet@20240620',
  messages: [
    {
      role: 'user',
      content: 'What is the meaning of life?',
    },
  ],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}

Streaming response format

Streaming responses are sent as Server-Sent Events (SSE), a web standard for real-time data streaming over HTTP. Each event contains a JSON object with the partial response data.

The response format follows the OpenAI streaming specification:

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"anthropic/claude-sonnet-4.5","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]} 

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"anthropic/claude-sonnet-4.5","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]} 

data: [DONE]

Key characteristics:

  • Each line starts with data: followed by JSON
  • Content is delivered incrementally in the delta.content field
  • The stream ends with data: [DONE]
  • Empty lines separate events

SSE Parsing Libraries:

If you're building custom SSE parsing (instead of using the OpenAI SDK), these libraries can help:

For more details about the SSE specification, see the W3C specification.

Image attachments

Send images as part of your chat completion request.

Images Inputs

PDF attachments

Send PDF documents as part of your chat completion request.

PDF Inputs

Audio attachments

Audio Inputs

Video attachments

Video Inputs

Parameters

The chat completions endpoint supports the following parameters:

Required parameters

  • model (string): The model to use for the completion (e.g., anthropic/claude-sonnet-4)
  • messages (array): Array of message objects with role and content fields

Optional parameters

  • stream (boolean): Whether to stream the response. Defaults to false
  • temperature (number): Controls randomness in the output. Range: 0-2
  • max_tokens (integer): Maximum number of tokens to generate
  • top_p (number): Nucleus sampling parameter. Range: 0-1
  • frequency_penalty (number): Penalty for frequent tokens. Range: -2 to 2
  • presence_penalty (number): Penalty for present tokens. Range: -2 to 2
  • stop (string or array): Stop sequences for the generation
  • tools (array): Array of tool definitions for function calling
  • tool_choice (string or object): Controls which tools are called (auto, none, or specific function)
  • provider (object): Provider routing and configuration options
  • response_format (object): Controls the format of the model's response
  • For OpenAI standard format: { type: "json_schema", json_schema: { name, schema, strict?, description? } }
  • For legacy format: { type: "json", schema?, name?, description? }
  • For plain text: { type: "text" }
  • See Structured outputs for detailed examples

Message format

Messages support different content types:

Text messages

{  
    "role": "user",  
    "content": "Hello, how are you?"
}

Multimodal messages

{  
    "role": "user",  
    "content": [    
        { 
            "type": "text", 
            "text": "What's in this image?" 
        },    
        {      
            "type": "image_url",      
            "image_url": {        
                "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..."      
            }    
        }  
    ]
}

File messages

{  
    "role": "user",  
    "content": [    
        { 
            "type": "text", 
            "text": "Summarize this document" 
        },    
        {      
            "type": "file",      
            "file": {        
                "data": "JVBERi0xLjQKJcfsj6IKNSAwIG9iago8PAovVHlwZSAvUGFnZQo...",        
                "media_type": "application/pdf",        
                "filename": "document.pdf"      
            }    
        }  
    ]
}