Inference Provider Routing
Siraya Model Router routes requests to the best available providers for your model.
By default, requests are load-balanced across the top providers to maximize uptime and best price.
You can customize how your requests are routed using the provider object in the request body for Chat Completions.
The provider object can contain the following fields:
| Field | Type | Default | Description |
|---|---|---|---|
sort |
string | - | Sort providers by price, throughput, or latency. (e.g. "price") |
allow_fallbacks |
boolean | true |
Whether to allow backup providers when the primary is unavailable. |
Cost-effective Load Balancing (Default Strategy)
For each model in your request, Siraya Model Router's default behavior is to load balance requests across providers, balancing the best throughput, lowest latency, and lowest price.
When you send a model request, Siraya Model Router automatically evaluates multiple providers in real time. It considers factors such as latency, throughput, reliability, and price.
Info
For instance, if Provider A offers slightly higher throughput but at a higher cost, while Provider B is more affordable with moderate latency, Siraya Model Router will intelligently balance requests across both to achieve the best overall performance and cost efficiency.
Info
If you are more sensitive to throughput than price, you can use the sort field to explicitly prioritize throughput.
If you have sort set in your provider preferences, load balancing default strategy will be disabled.
Provider Sorting (sort)
If you want to explicitly prioritize a particular provider attribute, you can include the sort field in the provider preferences. Default strategy will be disabled, and the router will try providers in the sorted order.
The three sort options are:
"price": prioritize lowest price"throughput": prioritize highest throughput"latency": prioritize lowest latency
import requests
headers = {
'Authorization': 'Bearer <API_KEY>',
'Content-Type': 'application/json',
}
response = requests.post('https://llm.siraya.ai/v1/chat/completions', headers=headers, json={
'model': 'gemini-3.1-pro-preview',
'messages': [{ 'role': 'user', 'content': 'Hello' }],
'provider': {
'sort': 'price',
},
})
import requests
headers = {
'Authorization': 'Bearer <API_KEY>',
'Content-Type': 'application/json',
}
response = requests.post('https://llm.siraya.ai/v1/chat/completions', headers=headers, json={
'model': 'gemini-3.1-pro-preview',
'messages': [{ 'role': 'user', 'content': 'Hello' }],
'provider': {
'sort': 'throughput',
},
})
import requests
headers = {
'Authorization': 'Bearer <API_KEY>',
'Content-Type': 'application/json',
}
response = requests.post('https://llm.siraya.ai/v1/chat/completions', headers=headers, json={
'model': 'gemini-3.1-pro-preview',
'messages': [{ 'role': 'user', 'content': 'Hello' }],
'provider': {
'sort': 'latency',
},
})
Disabling Fallbacks (allow_fallbacks)
By default, when a provider fails, Siraya Model Router will automatically try the next available provider. You can disable this behavior by setting allow_fallbacks to false.
| Field | Type | Default | Description |
|---|---|---|---|
allow_fallbacks |
boolean | true |
Whether to allow backup providers when the primary is unavailable. |
Example: Cheapest provider with no fallback
To guarantee that your request is only served by the lowest-cost provider with no fallback:
import requests
headers = {
'Authorization': 'Bearer <API_KEY>',
'Content-Type': 'application/json'
}
response = requests.post('https://llm.siraya.ai/v1/chat/completions', headers=headers, json={
'model': 'gemini-3.1-pro-preview',
'messages': [{ 'role': 'user', 'content': 'Hello' }],
'provider': {
'sort': 'price',
'allow_fallbacks': False,
},
})
curl https://llm.siraya.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <API_KEY>" \
-d '{
"model": "gemini-3.1-pro-preview",
"messages": [
{
"role": "user",
"content": "What is the meaning of life?"
}
],
"provider": {
"sort": "price",
"allow_fallbacks": false
}
}'