Skip to content

Speech to Text

Siraya AI's Speech to Text API provides highly accurate transcription and translation for audio files. We offer a unified interface for various STT models, supporting multiple languages and formats.

API Overview

Access both transcription (same language) and translation (to English) capabilities through our unified audio processing endpoints.

API Specification

Transcription

Transcribes audio into the input language.

curl https://audio.siraya.pro/v1/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -H "Authorization: <API_KEY>" \
  --form "file=@/path/to/speech.mp3" \
  --form "model=\"whisper-1\""
from openai import OpenAI

client = OpenAI(
    api_key="<API_KEY>",
    base_url="https://audio.siraya.pro/v1"
)

audio_file = open("/path/to/speech.mp3", "rb")
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file
)

print(transcription.text)

Translation

Translates audio into English.

curl https://audio.siraya.pro/v1/audio/translations \
  -H "Content-Type: multipart/form-data" \
  -H "Authorization: <API_KEY>" \
  --form "file=@/path/to/speech.mp3" \
  --form "model=\"whisper-1\""

Request Parameters

Parameter Type Description
file file The audio file object (e.g., .flac, .mp3, .mp4, .m4a, .wav, .webm).
model string The ID of the model to use (e.g., whisper-1).

Example Response

{
  "text": "Imagine the wildest idea that you've ever had...",
  "usage": {
    "type": "tokens",
    "input_tokens": 14,
    "total_tokens": 59
  }
}

Visit the Models Directory for all supported speech-to-text engines.