Text to Speech
Siraya AI provides a unified interface for high-quality audio synthesis from text. Integrate lifelike speech into your applications using various state-of-the-art TTS models.
API Overview
Our unified audio API simplifies the integration of multiple text-to-speech creation models (TTS).
API Specification
Generate audio by sending a POST request to labels /v1/audio/speech.
import requests
url = "https://audio.siraya.pro/v1/audio/speech"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer <API_KEY>"
}
data = {
"model": "gpt-4o-mini-tts",
"input": "Hello, how can I help you today?",
"voice": "alloy"
}
response = requests.post(url, headers=headers, json=data)
with open("speech.mp3", "wb") as f:
f.write(response.content)
Request Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
string | - | The ID of the TTS model (e.g., gpt-4o-mini-tts, tts-1). |
input |
string | - | The text to be converted into speech. |
voice |
string | - | The voice to use (e.g., alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse). |
Supported Voices
We support a wide range of expressive voices:
alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse.
Example Response
The API returns the raw binary audio data (MP3/WAV depending on model).
Visit the Models Directory to see all supported text-to-speech engines and their qualities.