Skip to content

Billing Transparency

Siraya Model Router uses a transparent billing system to ensure every call is precisely metered and billed. Pricing differs across models, and the same model may be priced differently across providers.

Model Prices

Model prices for each provider are listed on the model detail page.

Billing Items

Every API response automatically includes:

  • usage object with detailed token information

Python:

import requests
import json

response = requests.post(
    url="https://llm.siraya.ai/v1/chat/completions",
    headers={
        "Authorization": "Bearer <API_KEY>",
        "Content-Type": "application/json"
    },
    json={
        "model": "claude-sonnet-4.5",
        "messages": [
            {"role": "user", "content": "What is the meaning of life?"}
        ],
        "max_tokens": 500
    }
)
data = response.json()
print(f"Cost: ${data['cost']:.6f}")
print(f"Prompt cost: ${data['cost_details']['prompt_cost']:.6f}")
print(f"Completion cost: ${data['cost_details']['completion_cost']:.6f}")

cURL:

curl https://llm.siraya.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "model": "claude-sonnet-4.5",
    "messages": [
      {"role": "user", "content": "What is the meaning of life?"}
    ],
    "max_tokens": 500
  }'

Response:

{
  "id": "chatcmpl-qwjFeeUOCGy9ncrWw1rV3pRG",
  "object": "chat.completion",
  "created": 1774794546,
  "model": "claude-sonnet-4.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "This is one of humanity's oldest questions, and there's no single answer..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 43,
    "completion_tokens": 384,
    "total_tokens": 427,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 185,
      "audio_tokens": 0
    }
  }
}

reasoning_tokens (185) are included in completion_tokens (384) and billed at the same output price.

Streaming

For streaming requests with stream_options.include_usage: true, cost information is included in the final chunk (the one containing usage):

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[],"usage":{...},"cost":0.005889,"cost_details":{...}}

data: [DONE]

usage

Item Description
prompt_tokens Total input tokens, includes cached_tokens and cache_write_tokens
completion_tokens Total output tokens, includes reasoning_tokens
total_tokens prompt_tokens + completion_tokens
prompt_tokens_details.cached_tokens Tokens served from prompt cache
prompt_tokens_details.cache_write_tokens Tokens written to prompt cache
prompt_tokens_details.cache_write_token_details Breakdown by TTL: cache_write_5m_tokens, cache_write_1h_tokens
prompt_tokens_details.audio_tokens Audio input tokens
prompt_tokens_details.image_tokens Image input tokens
completion_tokens_details.reasoning_tokens Tokens used for internal reasoning (e.g., Claude extended thinking, o1/o3)
completion_tokens_details.audio_tokens Audio output tokens
completion_tokens_details.image_tokens Image output tokens (token-based image models)
server_tool_use.web_search_requests Number of web search requests made

Cost Calculation

Input Cost

text_tokens = prompt_tokens - cached_tokens - cache_write_tokens - audio_tokens - image_tokens

prompt_cost       = text_tokens × input_price
                  + audio_tokens × input_audio_price
                  + image_tokens × input_image_price
cache_read_cost   = cached_tokens × cache_read_price
cache_write_cost  = cache_write_tokens × cache_write_price

prompt_tokens is the total including all cached and multimodal tokens. We subtract them first, then price each category independently.

Output Cost

By default, all completion tokens (including reasoning and audio) are billed at the unified output_price:

completion_cost = completion_tokens × output_price
reasoning_cost  = 0
audio_cost      = 0
completion_cost = (completion_tokens - reasoning_tokens) × output_price
reasoning_cost  = reasoning_tokens × reasoning_price

The same applies to audio_output_price. This matches the industry convention used by OpenRouter and other LLM routing platforms.

Image Generation Cost

Image generation models (e.g., Imagen 4) are billed per image generated, not per token:

image_cost = n × image_generation_price

Where n is the number of images generated. The usage object for image generation does not contain token counts.

Token-based image models (e.g., GPT Image 1) are billed per token like chat models.

Video Generation Cost

Video generation models (e.g., Veo 3) are billed per second of generated video:

video_cost = seconds × video_second_price

Video generation is asynchronous. Cost is returned both at creation time (based on requested duration) and when polling shows status: completed.

Image endpoints (/v1/images/generations) return a top-level cost field directly on the response object. Token-based image models (e.g., Gemini image) also return usage with token details.

Video endpoints (/v1/videos/generations) return cost at creation time and in poll responses when status: completed.

Web Search Cost

native_web_search_cost = web_search_requests × search_price_per_request

Total

cost = prompt_cost + cache_read_cost + cache_write_cost
     + completion_cost + reasoning_cost + audio_cost
     + image_cost + video_cost
     + native_web_search_cost + tools_cost