Skip to content

Universal Audio API

The Universal Audio API provides a single set of endpoints for all your speech and audio processing needs, from high-fidelity TTS to accurate transcriptions.

Unified Endpoints

All audio requests are routed through the following specialized endpoints:

1. Text to Speech (TTS)

Endpoint: POST https://audio.siraya.pro/v1/audio/speech
Generates lifelike audio from input text.

2. Audio Translation

Endpoint: POST https://audio.siraya.pro/v1/audio/translations
Translates audio from any supported language into English text.

3. Audio Transcription

Endpoint: POST https://audio.siraya.pro/v1/speech/transcriptions
Transcribes spoken audio into text in the input language.

Usage Example

curl https://audio.siraya.pro/v1/speech/transcriptions \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: multipart/form-data" \
  --form "file=@/path/to/audio.mp3" \
  --form "model=\"whisper-1\""
import requests

url = "https://audio.siraya.pro/v1/audio/speech"
data = {
    "model": "gpt-4o-mini-tts",
    "input": "Building the future of audio.",
    "voice": "alloy"
}
headers = {"Authorization": "Bearer <API_KEY>"}

response = requests.post(url, headers=headers, json=data)
with open("output.mp3", "wb") as f:
    f.write(response.content)

For detailed parameters and model lists, refer to the Speech to Text and Text to Speech sections in the Generative Model API.