Parameters
Sampling parameters shape the token generation process of the model. You may send any parameters from the following list to the Siraya Model Router API.
Siraya Model Router will default to standard values if certain parameters are absent (e.g., temperature defaults to 1.0). Provider-specific parameters must be passed via extra_body — they cannot be placed at the top level of the request body.
Please refer to the Available Models section to confirm which parameters are supported by each model.
Core Parameters
Temperature
- Key:
temperature - Type: Float (0.0 to 2.0)
- Default: 1.0
Influences the variety in the model's responses. Lower values lead to more predictable responses, while higher values encourage diversity. At 0, the model becomes deterministic (same response for same input).
Top P
- Key:
top_p - Type: Float (0.0 to 1.0)
- Default: 1.0
Limits the model's choices to a percentage of likely tokens (Nucleus sampling). Only the top tokens whose cumulative probability adds up to P are considered.
Max Tokens
- Key:
max_tokens - Type: Integer (1 or above)
Sets the upper limit for the number of tokens the model can generate. Deprecated — use max_completion_tokens instead.
Max Completion Tokens
- Key:
max_completion_tokens - Type: Integer (1 or above)
Maximum generation token count (OpenAI recommended). Takes precedence over max_tokens if both are set.
Stream
- Key:
stream - Type: Boolean
- Default: false
Enables streaming responses via Server-Sent Events. See the Streaming Guide for details.
Stream Options
- Key:
stream_options - Type: Object
Streaming configuration. Use {"include_usage": true} to include token usage in the final streaming chunk.
Penalties and Bias
Frequency Penalty
- Key:
frequency_penalty - Type: Float (-2.0 to 2.0)
- Default: 0.0
Penalizes tokens based on their frequency in the text so far. Encourages the model to use less frequent tokens.
Presence Penalty
- Key:
presence_penalty - Type: Float (-2.0 to 2.0)
- Default: 0.0
Penalizes tokens based on whether they have already appeared in the text. Encourages the model to talk about new topics.
Logit Bias
- Key:
logit_bias - Type: Map (Token ID to Bias Value -100 to 100)
Modifies the likelihood of specific tokens appearing in the completion.
Output Structure
Response Format
- Key:
response_format - Type: Object (e.g.,
{ "type": "json_object" })
Forces the model to produce a specific output format. Setting to json_object enables JSON mode.
Stop
- Key:
stop - Type: Array of Strings (up to 4)
Immediately stop generation if any of the specified tokens are encountered.
Seed
- Key:
seed - Type: Integer
Used for deterministic sampling. Requests with the same seed and parameters should return similar results.
Logprobs
- Key:
logprobs - Type: Boolean
- Default: false
Whether to return token log probabilities of output tokens.
Top Logprobs
- Key:
top_logprobs - Type: Integer (0-20)
Number of top log probabilities to return. Requires logprobs=true.
Tools
Tools and Tool Choice
- Key:
tools,tool_choice - Type: Array (Tools), String or Object (Tool Choice)
Enables tool calling following the OpenAI specification. See the Tool Calling Guide for details.
Parallel Tool Calls
- Key:
parallel_tool_calls - Type: Boolean
Whether to allow parallel tool calls.
Reasoning
Reasoning Effort
- Key:
reasoning_effort - Type: String (
low,medium,high)
Controls reasoning effort level. Supported by OpenAI o1/o3 and Gemini 2.5 series.
Reasoning
- Key:
reasoning - Type: Object (e.g.,
{ "effort": "high" })
Reasoning config object (OpenRouter compatible). Takes precedence over reasoning_effort if both are set.
Thinking (Extended Thinking)
- Key:
thinking(viaextra_body) - Type: Object
Configuration for extended thinking. Supported by Anthropic Claude 3.7+ and Gemini 2.5 series.
| Field | Type | Description |
|---|---|---|
type |
string | enabled or disabled |
budget_tokens |
integer | Maximum token count for thinking (minimum 1024) |
Web Search
Web Search Options
- Key:
web_search_options - Type: Object
Enables web search for supported models. Pass empty {} to enable with defaults.
| Field | Type | Description |
|---|---|---|
search_context_size |
string | Search context amount: low, medium (default), high |
user_location |
object | User location for localized search results |
user_location.approximate.city |
string | City name (e.g., "Beijing") |
user_location.approximate.country |
string | ISO 3166-1 country code (e.g., "CN") |
user_location.approximate.timezone |
string | IANA timezone (e.g., "Asia/Shanghai") |
{
"web_search_options": {
"search_context_size": "high",
"user_location": {
"type": "approximate",
"approximate": {
"city": "Beijing",
"country": "CN",
"timezone": "Asia/Shanghai"
}
}
}
}
Routing
Provider
- Key:
provider - Type: Object
Vendor routing preferences (OpenRouter compatible). Controls load balancing vs sort mode.
| Field | Type | Description |
|---|---|---|
sort |
string | Sort providers by price, throughput, or latency |
allow_fallbacks |
boolean | Whether to allow backup providers (default: true) |
Transforms
- Key:
transforms - Type: Array of Strings
Message transforms. Supported: ["middle-out"] to compress prompts exceeding context size. Set to [] to disable.
Vendor-Specific Parameters
Extra Body
- Key:
extra_body - Type: Object (map[string]any)
Pass-through parameters for the vendor. Vendor-specific parameters (e.g., Bedrock Guardrail, Vertex AI Safety Settings) must be passed via extra_body — they cannot be placed at the top level of the request body.