# Model Serving - Scheduled for Retirement

The models listed here have been scheduled for decommissioning. Phoeniqs provides advance notice approximately 20 business days before the decommissioning date by publishing an announcement in Cloud Docs and notifying impacted customers directly. Communications include the expected timeline, recommended replacement models, and migration guidance where applicable, and are included in the monthly AI Platform newsletter shared with opt-in customers.


# Scheduled for Retirement

Model Name Decommissioning Date Model Type Input (Credits/M Tokens) Output (Credits/M Tokens) Suggested Replacement Description
inference-apertus-8b 11.05.2026 swiss-ai/Apertus-8B-Instruct-2509 Chat 0,1324 0,1431 inference-apertus-70b Optimized for multilingual dialogue use cases.
inference-deepseekr1-70b 11.05.2026 RedHatAI/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16 Chat 0,4617 0,4617 inference-qwq-32b Optimized for Reasoning chat completions.
inference-deepseekr1-670b 11.05.2026 RedHatAI/DeepSeek-R1-0528-quantized.w4a16 Chat 1,96 4,57 inference-deepseek-v32 Optimized for Reasoning chat completions.
inference-kimi-k2 11.05.2026 RedHatAI/Kimi-K2-Instruct-quantized.w4a16 Chat only 0,77 2,31 kimi-K2.6 Optimized for multilingual dialogue use cases.
inference-llama33-70b 11.05.2026 RedHatAI/Llama-3.3-70B-Instruct-quantized.w4a16 Chat only 0,5464 0,5464 inference-llama4-maverick Optimized for multilingual dialogue use cases.
inference-qwq25-vl-72b 11.05.2026 RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w4a16 Multimodal 0,8465 0,8465 inference-qwen3-vl-235b Optimized for compact and efficient vision-language model.

NOTE Once a model's decommissioning date has passed, it is permanently removed from the service and cannot be restored.