Reasoning

Many modern models, such as o1-preview or deepseek-reasoner, include internal chain-of-thought or "thinking" processes. Siraya AI allows you to access and control these reasoning capabilities via the OpenAI Responses API.

Reasoning Effort

Models that support reasoning often allow you to specify the "effort level," which determines how much time and compute the model spends on thinking before answering.

Effort Levels

minimal: Very brief thinking, fastest response.
low: Basic reasoning for straightforward logical steps.
medium (default): Standard reasoning for complex tasks.
high: Extensive thinking for challenging problems or code generation.

How to use Reasoning in Requests

Include the reasoning_effort parameter in your request body.

PythoncURL

from openai import OpenAI

client = OpenAI(
    base_url="https://llm.siraya.pro/v1",
    api_key="<API_KEY>"
)

response = client.chat.completions.create(
    model="o1-preview",
    messages=[{"role": "user", "content": "Solve the Riemann Hypothesis."}],
    reasoning_effort="high"
)

# Some models return reasoning steps in choice.message.reasoning
print(response.choices[0].message.content)

curl https://llm.siraya.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "model": "o1-preview",
    "messages": [{"role": "user", "content": "Think deeply about coffee."}],
    "reasoning_effort": "high"
  }'

Viewing Reasoning Output

For models that expose their internal thoughts, the reasoning content is typically delivered in a reasoning or theta field within the message object, or as specific delta chunks in a stream.

[!TIP] Use streaming to see the reasoning process in real-time as it's being generated.