Skip to content

Reasoning

Many modern models, such as gemini-2.5-pro or deepseek-r1, include internal chain-of-thought or "thinking" processes. Siraya Model Router allows you to access and control these reasoning capabilities via the OpenAI Responses API.

Reasoning Effort

Models that support reasoning allow you to specify the "effort level," which determines how much time and compute the model spends on thinking before answering.

Effort Levels

  • low: Basic reasoning for straightforward logical steps.
  • medium (default): Standard reasoning for complex tasks.
  • high: Extensive thinking for challenging problems or code generation.

How to use Reasoning in Requests

Include the reasoning parameter in your request body.

import requests
import json

response = requests.post(
    "https://llm.siraya.ai/v1/responses",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer <API_KEY>",
    },
    json={
        "model": "gemini-2.5-pro",
        "input": [
            {
                "type": "message",
                "role": "user",
                "content": "What is 25 * 37?"
            }
        ],
        "reasoning": {"effort": "high"}
    }
)

print(json.dumps(response.json(), indent=2))
curl https://llm.siraya.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "model": "gemini-2.5-pro",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": "What is 25 * 37?"
      }
    ],
    "reasoning": {"effort": "high"}
  }'

Reasoning Output

When reasoning is enabled, the response includes a reasoning output block alongside the final answer:

{
  "output": [
    {
      "type": "reasoning",
      "id": "rs_1774723121304299014",
      "status": "completed",
      "content": [
        {
          "type": "reasoning_text",
          "text": "Let me think about this... 25 * 37 = 25 * (40 - 3) = 1000 - 75 = 925."
        }
      ]
    },
    {
      "type": "message",
      "id": "msg_1774723121304299015",
      "status": "completed",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "25 × 37 = **925**"
        }
      ]
    }
  ]
}

[!TIP] Use streaming to see the reasoning process in real-time as it's being generated.