Reasoning
Many modern models, such as o1-preview or deepseek-reasoner, include internal chain-of-thought or "thinking" processes. Siraya AI allows you to access and control these reasoning capabilities via the OpenAI Responses API.
Reasoning Effort
Models that support reasoning often allow you to specify the "effort level," which determines how much time and compute the model spends on thinking before answering.
Effort Levels
- minimal: Very brief thinking, fastest response.
- low: Basic reasoning for straightforward logical steps.
- medium (default): Standard reasoning for complex tasks.
- high: Extensive thinking for challenging problems or code generation.
How to use Reasoning in Requests
Include the reasoning_effort parameter in your request body.
from openai import OpenAI
client = OpenAI(
base_url="https://llm.siraya.pro/v1",
api_key="<API_KEY>"
)
response = client.chat.completions.create(
model="o1-preview",
messages=[{"role": "user", "content": "Solve the Riemann Hypothesis."}],
reasoning_effort="high"
)
# Some models return reasoning steps in choice.message.reasoning
print(response.choices[0].message.content)
Viewing Reasoning Output
For models that expose their internal thoughts, the reasoning content is typically delivered in a reasoning or theta field within the message object, or as specific delta chunks in a stream.
[!TIP] Use streaming to see the reasoning process in real-time as it's being generated.