Backend.AI Blog - Inference

Posts tagged with 'Inference'

Lablup Releases 'mlxcel,' an Open-Source AI Inference Engine Optimized for Apple Silicon
By Lablup
Lablup open-sources mlxcel, an AI inference engine optimized for Apple Silicon (M1 to M5) and NVIDIA CUDA. Built in pure Rust with no Python runtime, it delivers 119% of mlx-lm's decode throughput and supports 80+ model architectures.
18 May 2026
Read more
How to save GPU memory in LLM serving: Principles and operating conditions of KV cache offloading
By Kyujin Cho, Jinho Heo
How KV cache offloading works in LLM serving for agentic AI: the architecture, data paths, and when offloading actually helps inference performance.
27 April 2026
- KV cache
- Inference
Read more
Backend.AI FastTrack 2 for Airflow Users: From Training to Serving
By Jeongseok Kang
How Airflow users can rebuild their MLOps pipelines more effectively in Backend.AI FastTrack 2, from training to serving.
29 June 2025
Read more
Sneak Peek: Backend.AI Model Service Preview
By Kyujin Cho
A first look at Backend.AI Model Service, which runs AI models from training through serving on a single infrastructure as large models flood the market.
30 May 2023
- Backend.AI
- Inference
Read more

backend.ai