#
Model Serving - Supported AI Models
#
Overview
This section outlines the AI models been deployed using OpenAI's API and Phoeniqs infrastructure. It is intended for engineers, data scientists, and DevOps teams involved in managing model integrations.
#
Supported Models
Token Throughput Disclaimer : The token-per-second throughput figures provided are based on controlled testing conditions and are intended for benchmarking and comparison purposes only. Actual performance in production environments may vary significantly depending on workload characteristics, system configuration, model hosting provider, network conditions, and other operational factors. These results should not be interpreted as a guarantee of real-world performance.
Model Updates and Deprecation Disclaimer We reserve the right to modify, upgrade, or replace any AI models used in our services at any time. This may include deprecating older models and introducing newer versions as we deem necessary to maintain performance, security, and service quality. While we aim to provide notice when feasible, changes may occur without prior notification.
NOTE Pricing is subject to change at our discretion.
#
Supported Models by Use Case
Agent Workflows- Models designed for autonomous agent operations:
- DeepSeek V32
- Qwen3 VL 235B
- GPT OSS 120B
Conversational, Code Generation & Multilingual Interactions- Models optimized for chat, coding, and multilingual tasks:
- Llama4 (Maverick, Scout), Llama3 (70B)
- DeepSeek (70B)
- Qwen (8B, 32B)
- Gemma (12B)
- Appertus (8B, 70B)
- Granite (8B)
RAG (Retrieval-Augmented Generation) Workflows- Models tailored for memory and document retrieval:
- BGE Models (M3, Re-ranker)
- Granite 278M
#
How to inference an AI model
To perform inference (i.e., generate responses or predictions) using a deployed AI model, you typically need the following components:
- Model Base URL
This is the API endpoint or base URL where your inference requests are sent.
** BASE API URL : https://maas.phoeniqs.com/ **
- Model Name
This specifies the exact model you're using.
Use the
Model Nameparameter provided in table Supported Models in API calls to specify the desired model name.
Examples:
- inference-llama4-maverick
- inference-bge-m3
- API Key
A secure token used to authenticate your requests.
- Pass it in the HTTP header: Authorization: Bearer YOUR_API_KEY
#
Different Sample Calls
#
1. Call to list available models
curl --location 'https://maas.phoeniqs.com/v1/models' \
--header 'Authorization: Bearer <API_Key>'
#
2. Sample Calls to Chat Completion
curl --location 'https://maas.phoeniqs.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <API_Key>' \
--data '{
"model": "inference-llama4-maverick",
"messages": [
{ "role": "user", "content": "How do I make sourdough bread?" }
],
"temperature": 0.7
}'
#
3. Sample Calls to Embeddings
- Option - A
curl --location 'https://maas.phoeniqs.com/v1/embeddings' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <API_Key>' \
--data '{
"model": "inference-bge-m3",
"input": "OpenAI develops AI models that understand and generate text."
}'
- Option - B
curl --location 'https://maas.phoeniqs.com/embeddings' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <API_Key>' \
--data '{
"model": "inference-bge-m3",
"input": "OpenAI develops AI models that understand and generate text."
}'
NOTE Normally v1/embeddings should work if not please try embeddings in the model URL path.
#
4. Sample Calls to MultiModal
curl --location 'https://maas.phoeniqs.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <API_Key>' \
--data '{
"model": "inference-granite-vision-2b",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is shown in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a9/Example.jpg/800px-Example.jpg"
}
}
]
}
],
"temperature": 0.7
}'
#
5. Sample Calls to OCR model
curl --location 'https://maas.phoeniqs.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <API_Key>' \
--data '{
"model": "inference-deepseek-ocr",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
}
},
{
"type": "text",
"text": "Free OCR."
}
]
}
],
"max_tokens": 2048,
"temperature": 0.0
}'
#
6. Sample Calls to Kimi-K2 model
Kimi-K2 has a special requirement while making call to it, where it requires additional argument called stop_token_ids and it must be paased with value 163586
curl --location 'https://maas.phoeniqs.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <API_Key>' \
--data '{
"model": "inference-kimi-k2",
"messages": [
{"role": "user", "content": "How are you?"}
],
"temperature": 0.7,
"stop_token_ids": [163586]
}'