Reasoning & Thinking
For models that support it, the Siraya Model Router API can return Reasoning Tokens, also known as Thinking Tokens. Siraya Model Router normalizes the different ways of customizing the amount of reasoning tokens that the model will use, providing a unified reasoning & thinking interface across different providers.
Reasoning tokens provide a transparent look into the reasoning steps taken by a model. Reasoning tokens are considered output tokens and charged accordingly.
Siraya Model Router provides a unified parameter wrapper for Reasoning & Thinking.
- If the "reasoning" field is not included in the request, Siraya Model Router will keep this parameter in an "unset" state, that is, follow the default value of the provider origin site.
- If the "reasoning" field is included in the request, Siraya Model Router will uniformly perform parameter conversion to adapt to the Reasoning & Thinking parameter formats of different providers.
Reasoning tokens are included in the response by default if the model decides to output them. Reasoning tokens will appear in the reasoning field of each message.
Controlling Reasoning Tokens in OpenAI Chat Completions
You can control reasoning tokens in your requests using the reasoning parameter:
{
"model": "gemini-2.5-pro",
"messages": [],
"reasoning": {
// One of the following (not both):
"effort": "high", // Can be "xhigh", "high", "medium", "low", "minimal" or "none"
"max_tokens": 2000, // Specific token limit
}
}
The reasoning config object consolidates settings for controlling reasoning strength across different models.
The effort can be one of below list:
"effort": "xhigh"- Allocates the largest portion of tokens for reasoning (approximately 95% of max_tokens)"effort": "high"- Allocates a large portion of tokens for reasoning (approximately 80% of max_tokens)"effort": "medium"- Allocates a moderate portion of tokens (approximately 50% of max_tokens)"effort": "low"- Allocates a smaller portion of tokens (approximately 20% of max_tokens)"effort": "minimal"- Allocates a minimal portion of tokens (approximately 10% of max_tokens)"effort": "none"- Disables reasoning entirely
For models that only support reasoning.max_tokens, the effort level will be set based on the percentages above.
Examples
Basic Usage with Reasoning Tokens
from openai import OpenAI
client = OpenAI(
base_url="https://llm.siraya.ai/v1",
api_key="<API_KEY>",
)
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[
{"role": "user", "content": "What is 25 * 37?"}
],
extra_body={
"reasoning": {
"effort": "high",
"max_tokens": 2000
}
},
)
print(response.model_dump_json())
Using Max Tokens for Reasoning
You can specify the exact number of tokens to use for reasoning:
from openai import OpenAI
client = OpenAI(
base_url="https://llm.siraya.ai/v1",
api_key="<API_KEY>",
)
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[
{"role": "user", "content": "What is 25 * 37?"}
],
extra_body={
"reasoning": {
"max_tokens": 200
}
},
)
print(response.model_dump_json())
Disables reasoning entirely
from openai import OpenAI
client = OpenAI(
base_url="https://llm.siraya.ai/v1",
api_key="<API_KEY>",
)
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[
{"role": "user", "content": "What is 25 * 37?"}
],
extra_body={
"reasoning": {
"effort": "none"
}
},
)
print(response.model_dump_json())
Streaming mode with reasoning tokens
from openai import OpenAI
client = OpenAI(
base_url="https://llm.siraya.ai/v1",
api_key="<API_KEY>",
)
def chat_completion_with_reasoning(messages):
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=messages,
max_tokens=10000,
extra_body={
"reasoning": {
"max_tokens": 8000,
"effort": "high"
}
},
stream=True
)
return response
for chunk in chat_completion_with_reasoning([
{"role": "user", "content": "What is 25 * 37?"}
]):
if hasattr(chunk.choices[0].delta, 'reasoning_details') and chunk.choices[0].delta.reasoning_details:
print(f"REASONING_DETAILS: {chunk.choices[0].delta.reasoning_details}")
elif getattr(chunk.choices[0].delta, 'content', None):
print(f"CONTENT: {chunk.choices[0].delta.content}")
Responses API Shape
When reasoning models generate responses, the reasoning information is structured in a standardized format through the reasoning_content item.
{
"id": "chatcmpl-abcdefghijklmnopqrstuvwx",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "25 * 37 = **925**",
"role": "assistant",
"reasoning_content": "Alright, the user wants the product of 25 and 37. Let's break it down: 25 times 30 is 750, and 25 times 7 is 175. Add those together, 750 plus 175, that's 925...",
"reasoning_details": [
{
"type": "reasoning.text",
"text": "Alright, the user wants the product of 25 and 37...",
"id": "reasoning-text-0",
"format": "google-gemini-v1",
"index": 0
}
]
}
}
],
"model": "gemini-2.5-pro",
"object": "chat.completion",
"usage": {
"completion_tokens": 685,
"prompt_tokens": 10,
"total_tokens": 695,
"completion_tokens_details": {
"reasoning_tokens": 673
}
}
}