Skip to content

Messages API

The Messages API is the primary interface for interacting with Claude models. It supports structured multi-turn conversations, system prompts, image inputs, and advanced features like extended thinking and tool calling.

Endpoint

POST https://llm.siraya.pro/v1/messages

Request Parameters

Parameter Type Required Description
model string Yes The model ID (e.g., claude-3-7-sonnet-20250219).
messages array Yes An array of input messages (roles: user, assistant).
max_tokens integer Yes The maximum number of tokens to generate.
system string/array No A system prompt to provide context or instructions.
temperature number No Amount of randomness (0.0 to 1.0).
stream boolean No Whether to use server-sent events for streaming.
thinking object No Configuration for extended thinking (Claude 3.7+).
tools array No Definitions of tools for the model to use.

Extended Thinking Example

{
  "model": "claude-3-7-sonnet@20250219",
  "max_tokens": 4096,
  "thinking": {
    "type": "enabled",
    "budget_tokens": 1024
  },
  "messages": [{"role": "user", "content": "Explain quantum entanglement."}]
}

Usage Examples

import anthropic

client = anthropic.Anthropic(
    base_url="https://llm.siraya.pro/v1",
    api_key="<API_KEY>"
)

message = client.messages.create(
    model="claude-3-7-sonnet-20250219",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude"}]
)
print(message.content)
curl https://llm.siraya.pro/v1/messages \
  -H "x-api-key: <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-7-sonnet-20250219",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Response Body

{
  "id": "msg_01...",
  "type": "message",
  "role": "assistant",
  "model": "claude-3-7-sonnet-20250219",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I help you today?"
    }
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 10,
    "output_tokens": 12
  }
}

Features

  • Alternating Roles: Messages must alternate between user and assistant.
  • Prompt Caching: Long system prompts or document context can be cached for faster, cheaper responses.
  • Multimodal: Pass images as base64 or URLs within the message content.