Embeddings API

Embeddings are numerical representations of text that capture semantic meaning. They are essential for tasks like document search, recommendation systems, and clustering. Siraya AI provides a unified API to access state-of-the-art embedding models from multiple providers (OpenAI, Jina, Voyage AI, etc.).

Base URL

https://llm.siraya.pro/v1/embeddings

How to Generate Embeddings

Basic Request (Python)

PythoncURL

import requests

url = "https://llm.siraya.pro/v1/embeddings"
headers = {
    "Authorization": "Bearer <API_KEY>",
    "Content-Type": "application/json"
}
data = {
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
}

response = requests.post(url, headers=headers, json=data)
embeddings = response.json()["data"][0]["embedding"]

curl https://llm.siraya.pro/v1/embeddings \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Batch Processing

To generate multiple embeddings in a single request, pass an array of strings to the input parameter:

{
  "model": "text-embedding-3-small",
  "input": ["First document text", "Second document text", "Third document text"]
}

Best Practices

Model Selection: Use smaller/faster models (like text-embedding-3-small) for low-latency search and larger models (like voyage-large-2) for better semantic accuracy.
Batching: Group multiple texts into a single request to reduce round-trip latency and overhead.
Normalization: Most models return normalized embeddings. Use cosine similarity for comparing vectors.
Caching: Since embeddings are deterministic for a given model and input, caching results can save significant costs.

Visit the Models Directory to see all supported embedding models and their dimensions.