Multimodal Input
Siraya Model Router supports multiple input modalities beyond text, allowing you to send images and PDFs to compatible models through our unified API. This enables rich multimodal interactions for a wide variety of use cases.
Supported Modalities
Images
Send images to vision-capable models for analysis, description, OCR, and more. Siraya Model Router supports multiple image formats and both URL-based and base64-encoded images.
Learn more about image inputs →
PDFs
Process PDF documents with any model on Siraya Model Router.
Learn more about PDF processing →
Getting Started
All multimodal inputs use the same /v1/chat/completions endpoint with the messages parameter. Different content types are specified in the message content array:
- Images: Use
image_urlcontent type - PDFs: Use
filecontent type with PDF data
You can combine multiple modalities in a single request, and the number of files you can send varies by provider and model.
Model Compatibility
Info
Not all models support every modality.
- Vision models: Required for image processing
- File-compatible models: Can process PDFs natively or through our parsing system
Use our Models page to find models that support your desired input modalities.
Input Format Support
Siraya Model Router supports both direct URLs and base64-encoded data for multimodal inputs:
URLs (Recommended for public content)
- Images:
https://example.com/image.jpg - PDFs:
https://example.com/document.pdf
Base64 Encoding (Required for local files)
- Images:
data:image/jpeg;base64,{base64_data} - PDFs:
data:application/pdf;base64,{base64_data}
URLs are more efficient for large files as they don't require local encoding and reduce request payload size.
Base64 encoding is required for local files or when the content is not publicly accessible.