Speech to Text
Siraya AI's Speech to Text API provides highly accurate transcription and translation for audio files. We offer a unified interface for various STT models, supporting multiple languages and formats.
API Overview
Access both transcription (same language) and translation (to English) capabilities through our unified audio processing endpoints.
API Specification
Transcription
Transcribes audio into the input language.
Translation
Translates audio into English.
Request Parameters
| Parameter | Type | Description |
|---|---|---|
file |
file | The audio file object (e.g., .flac, .mp3, .mp4, .m4a, .wav, .webm). |
model |
string | The ID of the model to use (e.g., whisper-1). |
Example Response
{
"text": "Imagine the wildest idea that you've ever had...",
"usage": {
"type": "tokens",
"input_tokens": 14,
"total_tokens": 59
}
}
Visit the Models Directory for all supported speech-to-text engines.