Usage Accounting

SIRAYA Model Router provides transparent tracking of model usage, token counts, and associated costs. Our Usage Accounting features allow you to monitor your credit consumption programmatically.

Token Counting

By default, SIRAYA Model Router returns token counts in the usage field of the API response.

Native Tokenization: Costs and billing are always calculated using the model provider's native tokenizer.
Normalized Metrics: For convenience, some responses may also include model-agnostic token counts for cross-model comparisons.

Response Format

The usage object in the response body includes:

Field	Description
`prompt_tokens`	Total tokens sent in the request.
`completion_tokens`	Total tokens generated by the model.
`total_tokens`	The sum of prompt and completion tokens.
`prompt_tokens_details`	Object. Breakdown of prompt tokens by source (cache, modality). See below.
`completion_tokens_details`	Object. Breakdown of completion tokens by category (reasoning, audio, predictions). See below.
`server_tool_use`	Object. Counters for built-in server-side tool calls (e.g. native web search). Present only when such tools were used.

Backend-specific fields not defined above (e.g. a seconds field for video models) are passed through as additional keys on usage for diagnostics.

`prompt_tokens_details`

Field	Description
`cached_tokens`	Prompt tokens served from prompt cache (cache hit).
`cache_write_tokens`	Prompt tokens written into cache on this request (prompt caching).
`cache_write_token_details`	Object splitting cache writes by TTL. Contains `cache_write_5m_tokens` and `cache_write_1h_tokens`.
`text_tokens`	Text input tokens.
`audio_tokens`	Audio input tokens (multimodal models).
`image_tokens`	Image input tokens (multimodal models).
`web_search_requests`	Number of native web search requests issued for this prompt.

`completion_tokens_details`

Field	Description
`reasoning_tokens`	Hidden reasoning tokens billed alongside the completion (e.g. o1, o3, Gemini Thinking).
`audio_tokens`	Audio output tokens (e.g. realtime / TTS-style models).
`accepted_prediction_tokens`	Speculative decoding: tokens that were accepted from the predicted draft.
`rejected_prediction_tokens`	Speculative decoding: tokens that were rejected and re-generated.
`text_tokens`	Text output tokens.
`image_tokens`	Image output tokens (e.g. GPT Image 1, Gemini image).

Usage in Streaming

For streaming requests, usage statistics are automatically included in the final SSE chunk.

You can also explicitly request usage via stream_options:

{
  "model": "claude-sonnet-4.6",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

The final chunk in the stream will contain the complete usage statistics:

{
  "choices": [],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 52,
    "total_tokens": 60,
    "prompt_tokens_details":  {
      "cached_tokens": 0,
      "cache_write_tokens": 0,
      "text_tokens": 0,
      "image_tokens": 0,
      "audio_tokens": 0,
      "web_search_requests": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 14,
      "text_tokens": 0,
      "audio_tokens": 0,
      "image_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  }
}

Billing Transparency

You can view your usage history in the Request Logs and overall spend metrics in the Dashboard. All charges are based on the specific model's price per 1M tokens as listed in our Models Page.