Reasoning
Many modern models, such as gemini-2.5-pro or deepseek-r1, include internal chain-of-thought or "thinking" processes. Siraya Model Router allows you to access and control these reasoning capabilities via the OpenAI Responses API.
Reasoning Effort
Models that support reasoning allow you to specify the "effort level," which determines how much time and compute the model spends on thinking before answering.
Effort Levels
- low: Basic reasoning for straightforward logical steps.
- medium (default): Standard reasoning for complex tasks.
- high: Extensive thinking for challenging problems or code generation.
How to use Reasoning in Requests
Include the reasoning parameter in your request body.
import requests
import json
response = requests.post(
"https://llm.siraya.ai/v1/responses",
headers={
"Content-Type": "application/json",
"Authorization": "Bearer <API_KEY>",
},
json={
"model": "gemini-2.5-pro",
"input": [
{
"type": "message",
"role": "user",
"content": "What is 25 * 37?"
}
],
"reasoning": {"effort": "high"}
}
)
print(json.dumps(response.json(), indent=2))
Reasoning Output
When reasoning is enabled, the response includes a reasoning output block alongside the final answer:
{
"output": [
{
"type": "reasoning",
"id": "rs_1774723121304299014",
"status": "completed",
"content": [
{
"type": "reasoning_text",
"text": "Let me think about this... 25 * 37 = 25 * (40 - 3) = 1000 - 75 = 925."
}
]
},
{
"type": "message",
"id": "msg_1774723121304299015",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "25 × 37 = **925**"
}
]
}
]
}
[!TIP] Use streaming to see the reasoning process in real-time as it's being generated.