Tag : Inference
Posts tagged with 'Inference'

Lablup Releases 'mlxcel,' an Open-Source AI Inference Engine Optimized for Apple Silicon
By LablupLablup open-sources mlxcel, an AI inference engine optimized for Apple Silicon (M1 to M5) and NVIDIA CUDA. Built in pure Rust with no Python runtime, it delivers 119% of mlx-lm's decode throughput and supports 80+ model architectures.18 May 2026

How to save GPU memory in LLM serving: Principles and operating conditions of KV cache offloading
By Kyujin Cho, Jinho HeoHow KV cache offloading works in LLM serving for agentic AI: the architecture, data paths, and when offloading actually helps inference performance.27 April 2026

Backend.AI FastTrack 2 for Airflow Users: From Training to Serving
By Jeongseok KangHow Airflow users can rebuild their MLOps pipelines more effectively in Backend.AI FastTrack 2, from training to serving.29 June 2025

Sneak Peek: Backend.AI Model Service Preview
By Kyujin ChoA first look at Backend.AI Model Service, which runs AI models from training through serving on a single infrastructure as large models flood the market.30 May 2023