Speech to Text

Siraya AI's Speech to Text API provides highly accurate transcription and translation for audio files. We offer a unified interface for various STT models, supporting multiple languages and formats.

API Overview

Access both transcription (same language) and translation (to English) capabilities through our unified audio processing endpoints.

API Specification

Transcription

Transcribes audio into the input language.

cURLPython (OpenAI SDK)

curl https://audio.siraya.pro/v1/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -H "Authorization: <API_KEY>" \
  --form "file=@/path/to/speech.mp3" \
  --form "model=\"whisper-1\""

from openai import OpenAI

client = OpenAI(
    api_key="<API_KEY>",
    base_url="https://audio.siraya.pro/v1"
)

audio_file = open("/path/to/speech.mp3", "rb")
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file
)

print(transcription.text)

Translation

Translates audio into English.

cURL

curl https://audio.siraya.pro/v1/audio/translations \
  -H "Content-Type: multipart/form-data" \
  -H "Authorization: <API_KEY>" \
  --form "file=@/path/to/speech.mp3" \
  --form "model=\"whisper-1\""

Request Parameters

Parameter	Type	Description
`file`	file	The audio file object (e.g., `.flac`, `.mp3`, `.mp4`, `.m4a`, `.wav`, `.webm`).
`model`	string	The ID of the model to use (e.g., `whisper-1`).

Example Response

{
  "text": "Imagine the wildest idea that you've ever had...",
  "usage": {
    "type": "tokens",
    "input_tokens": 14,
    "total_tokens": 59
  }
}

Visit the Models Directory for all supported speech-to-text engines.