Chat Completions

Endpoint

POST https://llm.siraya.pro/v1/chat/completions

Basic chat completion

Create a non-streaming chat completion.

Example request

PythonTypeScript

import os
from openai import OpenAI

client = OpenAI(
    api_key='<API_KEY>',
    base_url='https://llm.siraya.pro/v1'
)

completion = client.chat.completions.create(
    model='claude-3-5-sonnet@20240620',
    messages=[
        {
            'role': 'user',
            'content': 'What is the meaning of life?'
        }
    ],
    stream=False,
)

print('Assistant:', completion.choices[0].message.content)
print('Tokens used:', completion.usage)

import OpenAI from 'openai';

const openai = new OpenAI({
  '<API_KEY>',
  baseURL: 'https://llm.siraya.pro/v1',
});

const completion = await openai.chat.completions.create({
  model: 'claude-3-5-sonnet@20240620',
  messages: [
    {
      role: 'user',
      content: 'What is the meaning of life?',
    },
  ],
  stream: false,
});

console.log('Assistant:', completion.choices[0].message.content);
console.log('Tokens used:', completion.usage);

Streaming chat completion

Create a streaming chat completion that streams tokens as they are generated.

Example request

PythonTypeScript

import os
from openai import OpenAI

client = OpenAI(
    api_key='<API_KEY>',
    base_url='https://llm.siraya.pro/v1'
)

stream = client.chat.completions.create(
    model='claude-3-5-sonnet@20240620',
    messages=[
        {
            'role': 'user',
            'content': 'What is the meaning of life?'
        }
    ],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end='', flush=True)

import OpenAI from 'openai';

const openai = new OpenAI({
  '<API_KEY>',
  baseURL: 'https://llm.siraya.pro/v1',
});

const stream = await openai.chat.completions.create({
  model: 'claude-3-5-sonnet@20240620',
  messages: [
    {
      role: 'user',
      content: 'What is the meaning of life?',
    },
  ],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}

Streaming response format

Streaming responses are sent as Server-Sent Events (SSE), a web standard for real-time data streaming over HTTP. Each event contains a JSON object with the partial response data.

The response format follows the OpenAI streaming specification:

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"anthropic/claude-sonnet-4.5","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]} 

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"anthropic/claude-sonnet-4.5","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]} 

data: [DONE]

Key characteristics:

Each line starts with data: followed by JSON
Content is delivered incrementally in the delta.content field
The stream ends with data: [DONE]
Empty lines separate events

SSE Parsing Libraries:

If you're building custom SSE parsing (instead of using the OpenAI SDK), these libraries can help:

JavaScript/TypeScript: eventsource-parser - Robust SSE parsing with support for partial events
Python: httpx-sse - SSE support for HTTPX, or sseclient-py for requests

For more details about the SSE specification, see the W3C specification.

Image attachments

Send images as part of your chat completion request.

Images Inputs

PDF attachments

Send PDF documents as part of your chat completion request.

PDF Inputs

Audio attachments

Audio Inputs

Video attachments

Video Inputs

Parameters

The chat completions endpoint supports the following parameters:

Required parameters

model (string): The model to use for the completion (e.g., anthropic/claude-sonnet-4)
messages (array): Array of message objects with role and content fields

Optional parameters

stream (boolean): Whether to stream the response. Defaults to false
temperature (number): Controls randomness in the output. Range: 0-2
max_tokens (integer): Maximum number of tokens to generate
top_p (number): Nucleus sampling parameter. Range: 0-1
frequency_penalty (number): Penalty for frequent tokens. Range: -2 to 2
presence_penalty (number): Penalty for present tokens. Range: -2 to 2
stop (string or array): Stop sequences for the generation
tools (array): Array of tool definitions for function calling
tool_choice (string or object): Controls which tools are called (auto, none, or specific function)
provider (object): Provider routing and configuration options
response_format (object): Controls the format of the model's response
For OpenAI standard format: { type: "json_schema", json_schema: { name, schema, strict?, description? } }
For legacy format: { type: "json", schema?, name?, description? }
For plain text: { type: "text" }
See Structured outputs for detailed examples

Message format

Messages support different content types:

Text messages

{  
    "role": "user",  
    "content": "Hello, how are you?"
}

Multimodal messages

{  
    "role": "user",  
    "content": [    
        { 
            "type": "text", 
            "text": "What's in this image?" 
        },    
        {      
            "type": "image_url",      
            "image_url": {        
                "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..."      
            }    
        }  
    ]
}

File messages

{  
    "role": "user",  
    "content": [    
        { 
            "type": "text", 
            "text": "Summarize this document" 
        },    
        {      
            "type": "file",      
            "file": {        
                "data": "JVBERi0xLjQKJcfsj6IKNSAwIG9iago8PAovVHlwZSAvUGFnZQo...",        
                "media_type": "application/pdf",        
                "filename": "document.pdf"      
            }    
        }  
    ]
}