Billing Transparency

SIRAYA Model Router uses a transparent billing system to ensure every call is precisely metered and billed. Pricing differs across models, and the same model may be priced differently across providers.

Model Prices

Model prices for each provider are listed on the model detail page.

Billing Items

Every API response automatically includes:

usage object with detailed token information

Python:

import requests
import json

response = requests.post(
    url="https://llm.siraya.ai/v1/chat/completions",
    headers={
        "Authorization": "Bearer <API_KEY>",
        "Content-Type": "application/json"
    },
    json={
        "model": "claude-sonnet-4.5",
        "messages": [
            {"role": "user", "content": "What is the meaning of life?"}
        ],
        "max_tokens": 500
    }
)
data = response.json()
print(f"Cost: ${data['cost']:.6f}")
print(f"Prompt cost: ${data['cost_details']['prompt_cost']:.6f}")
print(f"Completion cost: ${data['cost_details']['completion_cost']:.6f}")

cURL:

curl https://llm.siraya.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "model": "claude-sonnet-4.5",
    "messages": [
      {"role": "user", "content": "What is the meaning of life?"}
    ],
    "max_tokens": 500
  }'

Response:

{
  "id": "chatcmpl-qwjFeeUOCGy9ncrWw1rV3pRG",
  "object": "chat.completion",
  "created": 1774794546,
  "model": "claude-sonnet-4.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "This is one of humanity's oldest questions, and there's no single answer..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 43,
    "completion_tokens": 384,
    "total_tokens": 427,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 185,
      "audio_tokens": 0
    }
  }
}

reasoning_tokens (185) are included in completion_tokens (384) and billed at the same output price.

Streaming

For streaming requests with stream_options.include_usage: true, cost information is included in the final chunk (the one containing usage):

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[],"usage":{...},"cost":0.005889,"cost_details":{...}}

data: [DONE]

usage

Item	Description
`prompt_tokens`	Total input tokens, includes `cached_tokens` and `cache_write_tokens`
`completion_tokens`	Total output tokens, includes `reasoning_tokens`
`total_tokens`	`prompt_tokens` + `completion_tokens`
`prompt_tokens_details.cached_tokens`	Tokens served from prompt cache
`prompt_tokens_details.cache_write_tokens`	Tokens written to prompt cache
`prompt_tokens_details.cache_write_token_details`	Breakdown by TTL: `cache_write_5m_tokens`, `cache_write_1h_tokens`
`prompt_tokens_details.audio_tokens`	Audio input tokens
`prompt_tokens_details.image_tokens`	Image input tokens
`completion_tokens_details.reasoning_tokens`	Tokens used for internal reasoning (e.g., Claude extended thinking, o1/o3)
`completion_tokens_details.audio_tokens`	Audio output tokens
`completion_tokens_details.image_tokens`	Image output tokens (token-based image models)
`server_tool_use.web_search_requests`	Number of web search requests made

Cost Calculation

Input Cost

text_tokens = prompt_tokens - cached_tokens - cache_write_tokens - audio_tokens - image_tokens

prompt_cost       = text_tokens × input_price
                  + audio_tokens × input_audio_price
                  + image_tokens × input_image_price
cache_read_cost   = cached_tokens × cache_read_price
cache_write_cost  = cache_write_tokens × cache_write_price

prompt_tokens is the total including all cached and multimodal tokens. We subtract them first, then price each category independently.

Output Cost

By default, all completion tokens (including reasoning and audio) are billed at the unified output_price:

completion_cost = completion_tokens × output_price
reasoning_cost  = 0
audio_cost      = 0

completion_cost = (completion_tokens - reasoning_tokens) × output_price
reasoning_cost  = reasoning_tokens × reasoning_price

The same applies to audio_output_price. This matches the industry convention used by OpenRouter and other LLM routing platforms.

Image Generation Cost

Image generation models (e.g., Imagen 4) are billed per image generated, not per token:

image_cost = n × image_generation_price

Where n is the number of images generated. The usage object for image generation does not contain token counts.

Token-based image models (e.g., GPT Image 1) are billed per token like chat models.

Video Generation Cost

Video generation models (e.g., Veo 3) are billed per second of generated video:

video_cost = seconds × video_second_price

Video generation is asynchronous. Cost is returned both at creation time (based on requested duration) and when polling shows status: completed.

Image endpoints (/v1/images/generations) return a top-level cost field directly on the response object. Token-based image models (e.g., Gemini image) also return usage with token details.

Video endpoints (/v1/videos/generations) return cost at creation time and in poll responses when status: completed.

Web Search Cost

native_web_search_cost = web_search_requests × search_price_per_request

Total

cost = prompt_cost + cache_read_cost + cache_write_cost
     + completion_cost + reasoning_cost + audio_cost
     + image_cost + video_cost
     + native_web_search_cost + tools_cost