#
How to inference an AI model
The Phoeniqs Model Service exposes an OpenAI-compatible API for every Active Model. To send an inference request you need three pieces of information: a Base URL, a Model Name, and an API Key.
#
Make your first call
Get the Base API URL
All inference requests are sent to the Phoeniqs Model Service endpoint.
Pick a Model Name
Choose a model from the Active Models table. The value in the Model Name column is what you pass as "model" in your request body.
Examples:
inference-llama4-maverickinference-bge-m3
Get your API Key
Your API Key authenticates every request. Pass it in the Authorization HTTP header:
Authorization: Bearer YOUR_API_KEY
Keep your API Key secret
Never commit an API Key to source control or expose it in client-side code. Rotate the key immediately if you suspect it has been leaked.
Send the request
With the three pieces above you can call any of the OpenAI-compatible endpoints. The minimal chat completion call looks like this:
curl --location 'https://maas.phoeniqs.com/v1/chat/completions' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"model": "inference-llama4-maverick",
"messages": [{"role": "user", "content": "Hello!"}]
}'
from openai import OpenAI
client = OpenAI(
base_url="https://maas.phoeniqs.com/v1",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="inference-llama4-maverick",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://maas.phoeniqs.com/v1",
apiKey: "YOUR_API_KEY",
});
const response = await client.chat.completions.create({
model: "inference-llama4-maverick",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
#
See also