Text to Speech

Siraya AI provides a unified interface for high-quality audio synthesis from text. Integrate lifelike speech into your applications using various state-of-the-art TTS models.

API Overview

Our unified audio API simplifies the integration of multiple text-to-speech creation models (TTS).

API Specification

Generate audio by sending a POST request to labels /v1/audio/speech.

cURLPython

curl https://audio.siraya.pro/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: <API_KEY>" \
  -d '{
    "model": "gpt-4o-mini-tts",
    "input": "Hello, how can I help you today?",
    "voice": "alloy"
  }' \
  --output speech.mp3

import requests

url = "https://audio.siraya.pro/v1/audio/speech"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer <API_KEY>"
}
data = {
    "model": "gpt-4o-mini-tts",
    "input": "Hello, how can I help you today?",
    "voice": "alloy"
}

response = requests.post(url, headers=headers, json=data)
with open("speech.mp3", "wb") as f:
    f.write(response.content)

Request Parameters

Parameter	Type	Default	Description
`model`	string	-	The ID of the TTS model (e.g., `gpt-4o-mini-tts`, `tts-1`).
`input`	string	-	The text to be converted into speech.
`voice`	string	-	The voice to use (e.g., `alloy`, `ash`, `ballad`, `coral`, `echo`, `fable`, `onyx`, `nova`, `sage`, `shimmer`, `verse`).

Supported Voices

We support a wide range of expressive voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse.

Example Response

The API returns the raw binary audio data (MP3/WAV depending on model).

Visit the Models Directory to see all supported text-to-speech engines and their qualities.