Backend.AI Blog

Top Stories

How to save GPU memory in LLM serving: Principles and operating conditions of KV cache offloading
By Kyujin Cho, Jinho Heo
Learn how KV cache offloading works in LLM serving for Agentic AI—covering architecture, data movement paths, and when offloading helps or hurts inference performance.
27 April 2026
- KV cache
- Inference
Read more
Building Production RAG Systems: Lessons from Tariff Support
By Sergey Leksikov
Lablup's research team shares what they learned building two production RAG systems over the past year: HSense, a multi-agent tariff classification system achieving 92.4% Top-1 accuracy on 10-digit HS codes, and a Backend.AI support assistant handling queries across seven documentation projects, including what didn't work and why retrieval quality matters more than model choice.
23 April 2026
- RAG
- LLM
Read more
Inside NVIDIA DGX Spark: Is DGX Spark Actually Blackwell?
By Jeongkyu Shin, Kyujin Cho
DGX Spark packs 1 PFLOP GB10 performance in a desktop form factor—but its SM12x GPU creates hidden compatibility gaps with the latest LLM kernels built for data center Blackwell (SM100).
19 February 2026
- Architecture
- Backend.AI:GO
Read more

27 April 2026

How to save GPU memory in LLM serving: Principles and operating conditions of KV cache offloading

By Kyujin Cho, Jinho Heo

23 April 2026
Building Production RAG Systems: Lessons from Tariff Support
By Sergey Leksikov
Read more
19 February 2026
Inside NVIDIA DGX Spark: Is DGX Spark Actually Blackwell?
By Jeongkyu Shin, Kyujin Cho
Read more

News
See all News
- Lablup at AI EXPO KOREA 2026: Booth Highlights
  By Lablup
  Lablup wrapped up AI EXPO KOREA 2026 at booth F04. A three-day recap, from Backend.AI's AI infrastructure orchestration to AI:GO running models and autonomous agents on a laptop.
  15 May 2026
  - Event
  - Lablup
  Read more
- Lablup Joins the Python Software Foundation as a Participating Sponsor
  By Lablup
  Lablup is now a Participating Sponsor of the Python Software Foundation (PSF).
  13 February 2026
  Read more
- Behind the Success: Lablup x Upstage Pass Phase 1 Evaluation for Sovereign AI Foundation Model Project
  By Lablup
  In January 2026, the Upstage consortium that Lablup is part of successfully passed the Phase 1 evaluation for the Korean government's Sovereign AI Foundation Model project. This initiative aims to protect national AI sovereignty by having the government provide support for GPUs, data, and talent development, while the private sector actively leverages these resources to develop frontier-grade AI foundation models. We sat down with team members from Upstage and Lablup to hear the behind-the-scenes story of our Phase 1 journey.
  6 February 2026
  Read more
See all News
Releases
See all Releases
- Lablup Releases 'mlxcel,' an Open-Source AI Inference Engine Optimized for Apple Silicon
  By Lablup
  Lablup open-sources mlxcel, a high-performance AI inference engine optimized for Apple Silicon (M1–M5) and NVIDIA CUDA. Built in pure Rust without a Python runtime, mlxcel delivers 119% average decode throughput versus mlx-lm, supports 80+ model architectures including LLMs and VLMs, and provides an OpenAI-compatible server for drop-in deployment.
  18 May 2026
  Read more
- Release: Backend.AI FastTrack 3 25.18
  By Lablup
  This article covers the major changes in Backend.AI FastTrack 3 25.18.
  5 January 2026
  Read more
- Release: Backend.AI 25.15 (LTS)
  By Lablup
  Backend.AI 25.15 LTS is now officially available. This release brings comprehensive system-level optimization and user experience improvements, reinforcing the platform’s reliability and scalability for large-scale AI model training, deployment, and research.
  2 October 2025
  Read more
See all Releases
Engineering
See all Engineering
- How to save GPU memory in LLM serving: Principles and operating conditions of KV cache offloading
  By Kyujin Cho, Jinho Heo
  Learn how KV cache offloading works in LLM serving for Agentic AI—covering architecture, data movement paths, and when offloading helps or hurts inference performance.
  27 April 2026
  - KV cache
  - Inference
  Read more
- Building Production RAG Systems: Lessons from Tariff Support
  By Sergey Leksikov
  Lablup's research team shares what they learned building two production RAG systems over the past year: HSense, a multi-agent tariff classification system achieving 92.4% Top-1 accuracy on 10-digit HS codes, and a Backend.AI support assistant handling queries across seven documentation projects, including what didn't work and why retrieval quality matters more than model choice.
  23 April 2026
  - RAG
  - LLM
  Read more
- Writing Stories for 50 Components: Foundation, Automation, and AI
  By Seunghyun Lim
  To write Storybook stories for 50+ BAI components in the Backend.AI WebUI, I started by setting up the infrastructure— i18n, theming, and branding — then upgraded to Storybook and merged two instances into one. An automation pipeline combining a 1,000-line guideline, Claude-based story generation, and GitHub Actions CI checks kept quality consistent from PR creation through deployment.
  5 March 2026
  - Frontend
  - Guide
  Read more
See all Engineering

backend.ai

Blog

Top Stories

News

Releases

Engineering

We value your privacy