Top Stories

How to save GPU memory in LLM serving: Principles and operating conditions of KV cache offloading
By Kyujin Cho, Jinho HeoLearn how KV cache offloading works in LLM serving for Agentic AI—covering architecture, data movement paths, and when offloading helps or hurts inference performance.27 April 2026

Building Production RAG Systems: Lessons from Tariff Support
By Sergey LeksikovLablup's research team shares what they learned building two production RAG systems over the past year: HSense, a multi-agent tariff classification system achieving 92.4% Top-1 accuracy on 10-digit HS codes, and a Backend.AI support assistant handling queries across seven documentation projects, including what didn't work and why retrieval quality matters more than model choice.23 April 2026

Inside NVIDIA DGX Spark: Is DGX Spark Actually Blackwell?
By Jeongkyu Shin, Kyujin ChoDGX Spark packs 1 PFLOP GB10 performance in a desktop form factor—but its SM12x GPU creates hidden compatibility gaps with the latest LLM kernels built for data center Blackwell (SM100).19 February 2026

How to save GPU memory in LLM serving: Principles and operating conditions of KV cache offloading

19 February 2026Inside NVIDIA DGX Spark: Is DGX Spark Actually Blackwell?
By Jeongkyu Shin, Kyujin Cho
News

Lablup at AI EXPO KOREA 2026: Booth Highlights
By LablupLablup wrapped up AI EXPO KOREA 2026 at booth F04. A three-day recap, from Backend.AI's AI infrastructure orchestration to AI:GO running models and autonomous agents on a laptop.15 May 2026

Lablup Joins the Python Software Foundation as a Participating Sponsor
By LablupLablup is now a Participating Sponsor of the Python Software Foundation (PSF).13 February 2026

Behind the Success: Lablup x Upstage Pass Phase 1 Evaluation for Sovereign AI Foundation Model Project
By LablupIn January 2026, the Upstage consortium that Lablup is part of successfully passed the Phase 1 evaluation for the Korean government's Sovereign AI Foundation Model project. This initiative aims to protect national AI sovereignty by having the government provide support for GPUs, data, and talent development, while the private sector actively leverages these resources to develop frontier-grade AI foundation models. We sat down with team members from Upstage and Lablup to hear the behind-the-scenes story of our Phase 1 journey.6 February 2026
Releases

Lablup Releases 'mlxcel,' an Open-Source AI Inference Engine Optimized for Apple Silicon
By LablupLablup open-sources mlxcel, a high-performance AI inference engine optimized for Apple Silicon (M1–M5) and NVIDIA CUDA. Built in pure Rust without a Python runtime, mlxcel delivers 119% average decode throughput versus mlx-lm, supports 80+ model architectures including LLMs and VLMs, and provides an OpenAI-compatible server for drop-in deployment.18 May 2026

Release: Backend.AI FastTrack 3 25.18
By LablupThis article covers the major changes in Backend.AI FastTrack 3 25.18.5 January 2026

Release: Backend.AI 25.15 (LTS)
By LablupBackend.AI 25.15 LTS is now officially available. This release brings comprehensive system-level optimization and user experience improvements, reinforcing the platform’s reliability and scalability for large-scale AI model training, deployment, and research.2 October 2025
Engineering

How to save GPU memory in LLM serving: Principles and operating conditions of KV cache offloading
By Kyujin Cho, Jinho HeoLearn how KV cache offloading works in LLM serving for Agentic AI—covering architecture, data movement paths, and when offloading helps or hurts inference performance.27 April 2026

Building Production RAG Systems: Lessons from Tariff Support
By Sergey LeksikovLablup's research team shares what they learned building two production RAG systems over the past year: HSense, a multi-agent tariff classification system achieving 92.4% Top-1 accuracy on 10-digit HS codes, and a Backend.AI support assistant handling queries across seven documentation projects, including what didn't work and why retrieval quality matters more than model choice.23 April 2026

Writing Stories for 50 Components: Foundation, Automation, and AI
By Seunghyun LimTo write Storybook stories for 50+ BAI components in the Backend.AI WebUI, I started by setting up the infrastructure— i18n, theming, and branding — then upgraded to Storybook and merged two instances into one. An automation pipeline combining a 1,000-line guideline, Claude-based story generation, and GitHub Actions CI checks kept quality consistent from PR creation through deployment.5 March 2026