Skip to content

Parameters

Sampling parameters shape the token generation process of the model. You may send any parameters from the following list to the Siraya Model Router API.

Siraya Model Router will default to standard values if certain parameters are absent (e.g., temperature defaults to 1.0). Provider-specific parameters must be passed via extra_body — they cannot be placed at the top level of the request body.

Please refer to the Available Models section to confirm which parameters are supported by each model.

Core Parameters

Temperature

  • Key: temperature
  • Type: Float (0.0 to 2.0)
  • Default: 1.0

Influences the variety in the model's responses. Lower values lead to more predictable responses, while higher values encourage diversity. At 0, the model becomes deterministic (same response for same input).

Top P

  • Key: top_p
  • Type: Float (0.0 to 1.0)
  • Default: 1.0

Limits the model's choices to a percentage of likely tokens (Nucleus sampling). Only the top tokens whose cumulative probability adds up to P are considered.

Max Tokens

  • Key: max_tokens
  • Type: Integer (1 or above)

Sets the upper limit for the number of tokens the model can generate. Deprecated — use max_completion_tokens instead.

Max Completion Tokens

  • Key: max_completion_tokens
  • Type: Integer (1 or above)

Maximum generation token count (OpenAI recommended). Takes precedence over max_tokens if both are set.

Stream

  • Key: stream
  • Type: Boolean
  • Default: false

Enables streaming responses via Server-Sent Events. See the Streaming Guide for details.

Stream Options

  • Key: stream_options
  • Type: Object

Streaming configuration. Use {"include_usage": true} to include token usage in the final streaming chunk.

Penalties and Bias

Frequency Penalty

  • Key: frequency_penalty
  • Type: Float (-2.0 to 2.0)
  • Default: 0.0

Penalizes tokens based on their frequency in the text so far. Encourages the model to use less frequent tokens.

Presence Penalty

  • Key: presence_penalty
  • Type: Float (-2.0 to 2.0)
  • Default: 0.0

Penalizes tokens based on whether they have already appeared in the text. Encourages the model to talk about new topics.

Logit Bias

  • Key: logit_bias
  • Type: Map (Token ID to Bias Value -100 to 100)

Modifies the likelihood of specific tokens appearing in the completion.

Output Structure

Response Format

  • Key: response_format
  • Type: Object (e.g., { "type": "json_object" })

Forces the model to produce a specific output format. Setting to json_object enables JSON mode.

Stop

  • Key: stop
  • Type: Array of Strings (up to 4)

Immediately stop generation if any of the specified tokens are encountered.

Seed

  • Key: seed
  • Type: Integer

Used for deterministic sampling. Requests with the same seed and parameters should return similar results.

Logprobs

  • Key: logprobs
  • Type: Boolean
  • Default: false

Whether to return token log probabilities of output tokens.

Top Logprobs

  • Key: top_logprobs
  • Type: Integer (0-20)

Number of top log probabilities to return. Requires logprobs=true.

Tools

Tools and Tool Choice

  • Key: tools, tool_choice
  • Type: Array (Tools), String or Object (Tool Choice)

Enables tool calling following the OpenAI specification. See the Tool Calling Guide for details.

Parallel Tool Calls

  • Key: parallel_tool_calls
  • Type: Boolean

Whether to allow parallel tool calls.

Reasoning

Reasoning Effort

  • Key: reasoning_effort
  • Type: String (low, medium, high)

Controls reasoning effort level. Supported by OpenAI o1/o3 and Gemini 2.5 series.

Reasoning

  • Key: reasoning
  • Type: Object (e.g., { "effort": "high" })

Reasoning config object (OpenRouter compatible). Takes precedence over reasoning_effort if both are set.

Thinking (Extended Thinking)

  • Key: thinking (via extra_body)
  • Type: Object

Configuration for extended thinking. Supported by Anthropic Claude 3.7+ and Gemini 2.5 series.

Field Type Description
type string enabled or disabled
budget_tokens integer Maximum token count for thinking (minimum 1024)
{
  "extra_body": {
    "thinking": {
      "type": "enabled",
      "budget_tokens": 10000
    }
  }
}

Web Search Options

  • Key: web_search_options
  • Type: Object

Enables web search for supported models. Pass empty {} to enable with defaults.

Field Type Description
search_context_size string Search context amount: low, medium (default), high
user_location object User location for localized search results
user_location.approximate.city string City name (e.g., "Beijing")
user_location.approximate.country string ISO 3166-1 country code (e.g., "CN")
user_location.approximate.timezone string IANA timezone (e.g., "Asia/Shanghai")
{
  "web_search_options": {
    "search_context_size": "high",
    "user_location": {
      "type": "approximate",
      "approximate": {
        "city": "Beijing",
        "country": "CN",
        "timezone": "Asia/Shanghai"
      }
    }
  }
}

Routing

Provider

  • Key: provider
  • Type: Object

Vendor routing preferences (OpenRouter compatible). Controls load balancing vs sort mode.

Field Type Description
sort string Sort providers by price, throughput, or latency
allow_fallbacks boolean Whether to allow backup providers (default: true)

Transforms

  • Key: transforms
  • Type: Array of Strings

Message transforms. Supported: ["middle-out"] to compress prompts exceeding context size. Set to [] to disable.

Vendor-Specific Parameters

Extra Body

  • Key: extra_body
  • Type: Object (map[string]any)

Pass-through parameters for the vendor. Vendor-specific parameters (e.g., Bedrock Guardrail, Vertex AI Safety Settings) must be passed via extra_body — they cannot be placed at the top level of the request body.

{
  "extra_body": {
    "thinking": {
      "type": "enabled",
      "budget_tokens": 10000
    }
  }
}