Skip to content

Text to Speech

Siraya AI provides a unified interface for high-quality audio synthesis from text. Integrate lifelike speech into your applications using various state-of-the-art TTS models.

API Overview

Our unified audio API simplifies the integration of multiple text-to-speech creation models (TTS).

API Specification

Generate audio by sending a POST request to labels /v1/audio/speech.

curl https://audio.siraya.pro/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: <API_KEY>" \
  -d '{
    "model": "gpt-4o-mini-tts",
    "input": "Hello, how can I help you today?",
    "voice": "alloy"
  }' \
  --output speech.mp3
import requests

url = "https://audio.siraya.pro/v1/audio/speech"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer <API_KEY>"
}
data = {
    "model": "gpt-4o-mini-tts",
    "input": "Hello, how can I help you today?",
    "voice": "alloy"
}

response = requests.post(url, headers=headers, json=data)
with open("speech.mp3", "wb") as f:
    f.write(response.content)

Request Parameters

Parameter Type Default Description
model string - The ID of the TTS model (e.g., gpt-4o-mini-tts, tts-1).
input string - The text to be converted into speech.
voice string - The voice to use (e.g., alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse).

Supported Voices

We support a wide range of expressive voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse.

Example Response

The API returns the raw binary audio data (MP3/WAV depending on model).

Visit the Models Directory to see all supported text-to-speech engines and their qualities.