Skip to content

Inference Provider Routing

Siraya Model Router routes requests to the best available providers for your model.

By default, requests are load-balanced across the top providers to maximize uptime and best price.

You can customize how your requests are routed using the provider object in the request body for Chat Completions.

The provider object can contain the following fields:

Field Type Default Description
sort string - Sort providers by price, throughput, or latency. (e.g. "price")
allow_fallbacks boolean true Whether to allow backup providers when the primary is unavailable.

Cost-effective Load Balancing (Default Strategy)

For each model in your request, Siraya Model Router's default behavior is to load balance requests across providers, balancing the best throughput, lowest latency, and lowest price.

When you send a model request, Siraya Model Router automatically evaluates multiple providers in real time. It considers factors such as latency, throughput, reliability, and price.

Info

For instance, if Provider A offers slightly higher throughput but at a higher cost, while Provider B is more affordable with moderate latency, Siraya Model Router will intelligently balance requests across both to achieve the best overall performance and cost efficiency.

Info

If you are more sensitive to throughput than price, you can use the sort field to explicitly prioritize throughput.

If you have sort set in your provider preferences, load balancing default strategy will be disabled.

Provider Sorting (sort)

If you want to explicitly prioritize a particular provider attribute, you can include the sort field in the provider preferences. Default strategy will be disabled, and the router will try providers in the sorted order.

The three sort options are:

  • "price": prioritize lowest price
  • "throughput": prioritize highest throughput
  • "latency": prioritize lowest latency
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.siraya.ai/v1/chat/completions', headers=headers, json={
  'model': 'gemini-3.1-pro-preview',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'price',
  },
})
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.siraya.ai/v1/chat/completions', headers=headers, json={
  'model': 'gemini-3.1-pro-preview',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'throughput',
  },
})
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.siraya.ai/v1/chat/completions', headers=headers, json={
  'model': 'gemini-3.1-pro-preview',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'latency',
  },
})

Disabling Fallbacks (allow_fallbacks)

By default, when a provider fails, Siraya Model Router will automatically try the next available provider. You can disable this behavior by setting allow_fallbacks to false.

Field Type Default Description
allow_fallbacks boolean true Whether to allow backup providers when the primary is unavailable.

Example: Cheapest provider with no fallback

To guarantee that your request is only served by the lowest-cost provider with no fallback:

import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.siraya.ai/v1/chat/completions', headers=headers, json={
  'model': 'gemini-3.1-pro-preview',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'price',
    'allow_fallbacks': False,
  },
})
curl https://llm.siraya.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "gemini-3.1-pro-preview",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "sort": "price",
      "allow_fallbacks": false
    }
}'