#
Model Serving - Active Models
This is the list of AI models currently live and available on the Phoeniqs Model Service. All models are served on Phoeniqs infrastructure through an OpenAI-compatible API and are ready for production use.
#
Active Models
Token Throughput Disclaimer : The token-per-second throughput figures provided are based on controlled testing conditions and are intended for benchmarking and comparison purposes only. Actual performance in production environments may vary significantly depending on workload characteristics, system configuration, model hosting provider, network conditions, and other operational factors. These results should not be interpreted as a guarantee of real-world performance.
Model Updates and Deprecation Disclaimer We reserve the right to modify, upgrade, or replace any AI models used in our services at any time. This may include deprecating older models and introducing newer versions as we deem necessary to maintain performance, security, and service quality. While we aim to provide notice when feasible, changes may occur without prior notification.
NOTE Pricing is subject to change at our discretion.
#
Active Models by Use Case
Agent Workflows- Models designed for autonomous agent operations:
- DeepSeek V32
- Qwen3 VL 235B
- GPT OSS 120B
Conversational, Code Generation & Multilingual Interactions- Models optimized for chat, coding, and multilingual tasks:
- Llama4 (Maverick, Scout)
- Qwen (8B, 32B)
- Gemma (12B)
- Appertus (70B)
- Granite (8B)
RAG (Retrieval-Augmented Generation) Workflows- Models tailored for memory and document retrieval:
- BGE Models (M3, Re-ranker)
- Granite 278M
#
Using the models
Looking for ready-to-run examples? See the Model Service Guides:
- How to inference an AI model — what you need to make a call (Base URL, Model Name, API Key).
- Sample API calls — cURL examples for chat, embeddings, multimodal, OCR, and more.