Billing Transparency
Siraya Model Router uses a transparent billing system to ensure every call is precisely metered and billed. Pricing differs across models, and the same model may be priced differently across providers.
Model Prices
Model prices for each provider are listed on the model detail page.
Billing Items
Every API response automatically includes:
usageobject with detailed token information
Python:
import requests
import json
response = requests.post(
url="https://llm.siraya.ai/v1/chat/completions",
headers={
"Authorization": "Bearer <API_KEY>",
"Content-Type": "application/json"
},
json={
"model": "claude-sonnet-4.5",
"messages": [
{"role": "user", "content": "What is the meaning of life?"}
],
"max_tokens": 500
}
)
data = response.json()
print(f"Cost: ${data['cost']:.6f}")
print(f"Prompt cost: ${data['cost_details']['prompt_cost']:.6f}")
print(f"Completion cost: ${data['cost_details']['completion_cost']:.6f}")
cURL:
curl https://llm.siraya.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <API_KEY>" \
-d '{
"model": "claude-sonnet-4.5",
"messages": [
{"role": "user", "content": "What is the meaning of life?"}
],
"max_tokens": 500
}'
Response:
{
"id": "chatcmpl-qwjFeeUOCGy9ncrWw1rV3pRG",
"object": "chat.completion",
"created": 1774794546,
"model": "claude-sonnet-4.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "This is one of humanity's oldest questions, and there's no single answer..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 43,
"completion_tokens": 384,
"total_tokens": 427,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 185,
"audio_tokens": 0
}
}
}
reasoning_tokens(185) are included incompletion_tokens(384) and billed at the same output price.
Streaming
For streaming requests with stream_options.include_usage: true, cost information is included in the final chunk (the one containing usage):
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[],"usage":{...},"cost":0.005889,"cost_details":{...}}
data: [DONE]
usage
| Item | Description |
|---|---|
prompt_tokens |
Total input tokens, includes cached_tokens and cache_write_tokens |
completion_tokens |
Total output tokens, includes reasoning_tokens |
total_tokens |
prompt_tokens + completion_tokens |
prompt_tokens_details.cached_tokens |
Tokens served from prompt cache |
prompt_tokens_details.cache_write_tokens |
Tokens written to prompt cache |
prompt_tokens_details.cache_write_token_details |
Breakdown by TTL: cache_write_5m_tokens, cache_write_1h_tokens |
prompt_tokens_details.audio_tokens |
Audio input tokens |
prompt_tokens_details.image_tokens |
Image input tokens |
completion_tokens_details.reasoning_tokens |
Tokens used for internal reasoning (e.g., Claude extended thinking, o1/o3) |
completion_tokens_details.audio_tokens |
Audio output tokens |
completion_tokens_details.image_tokens |
Image output tokens (token-based image models) |
server_tool_use.web_search_requests |
Number of web search requests made |
Cost Calculation
Input Cost
text_tokens = prompt_tokens - cached_tokens - cache_write_tokens - audio_tokens - image_tokens
prompt_cost = text_tokens × input_price
+ audio_tokens × input_audio_price
+ image_tokens × input_image_price
cache_read_cost = cached_tokens × cache_read_price
cache_write_cost = cache_write_tokens × cache_write_price
prompt_tokensis the total including all cached and multimodal tokens. We subtract them first, then price each category independently.
Output Cost
By default, all completion tokens (including reasoning and audio) are billed at the unified output_price:
completion_cost = (completion_tokens - reasoning_tokens) × output_price
reasoning_cost = reasoning_tokens × reasoning_price
The same applies to audio_output_price. This matches the industry convention used by OpenRouter and other LLM routing platforms.
Image Generation Cost
Image generation models (e.g., Imagen 4) are billed per image generated, not per token:
Where n is the number of images generated. The usage object for image generation does not contain token counts.
Token-based image models (e.g., GPT Image 1) are billed per token like chat models.
Video Generation Cost
Video generation models (e.g., Veo 3) are billed per second of generated video:
Video generation is asynchronous. Cost is returned both at creation time (based on requested duration) and when polling shows status: completed.
Image endpoints (
/v1/images/generations) return a top-levelcostfield directly on the response object. Token-based image models (e.g., Gemini image) also returnusagewith token details.Video endpoints (
/v1/videos/generations) returncostat creation time and in poll responses whenstatus: completed.