Engineering

VLM Safety Failures: Why safe scenes get flagged as dangerous
By Youngsook Song, Dasol ChoiVision-language models detect real emergencies reasonably well, but they often overreact by misclassifying safe scenes as dangerous. This post shares results from the VERI benchmark, which measures this limitation using visually similar emergency and safe scenes with different meanings.18 June 2026

Agent coding at long context: What KV cache offloading on VAST Data & Backend.AI buys you
By Jinho Heo and 2 othersA joint benchmark by Lablup and VAST Data shows that KV cache offloading improves TTFT by up to 3.3x and reduces overall latency by more than half in agent-based coding workloads.16 June 2026

Intel Arc meets Backend.AI: What the Arc Pro B70's 32GB memory buys for agentic AI
By Jinho Heo and 2 othersBackend.AI now officially supports Intel Arc Pro B70, expanding its Intel lineup beyond Gaudi 2/3 AI accelerators to Arc graphics. From datacenter Gaudi to workstation Arc Pro, teams can manage Intel AI hardware in one platform.12 June 2026

eBPF security audit for GPU clusters with Backend.AI
By Kyujin Cho, Jinho HeoThis blog introduces eBPF, a method for implementing security audits while maintaining the performance of GPU training clusters. It covers the principles of eBPF, which directly processes events within the kernel, audit application cases, and security precautions.27 May 2026

How to save GPU memory in LLM serving: Principles and operating conditions of KV cache offloading
By Kyujin Cho, Jinho HeoHow KV cache offloading works in LLM serving for agentic AI: the architecture, data paths, and when offloading actually helps inference performance.27 April 2026

Building Production RAG Systems: Lessons from Tariff Support
By Sergey LeksikovLablup's research team shares lessons from two production RAG systems, the HSense tariff classifier (92.4% Top-1) and a Backend.AI support assistant, including why retrieval quality matters more than model choice.23 April 2026

Writing Stories for 50 Components: Foundation, Automation, and AI
By Seunghyun LimWriting Storybook stories for 50+ Backend.AI WebUI components: setting up i18n, theming, and branding, then automating with a 1,000-line guideline, Claude-based generation, and GitHub Actions CI.5 March 2026

The Pulse of 500+ GPUs: Monitoring Large-Scale AI Training Clusters
By Hanjeong LeeHow Lablup built proactive fault detection for a 500+ NVIDIA B200 GPU cluster while supporting Solar Open 102B pretraining, including its data collection and real anomaly cases.20 February 2026

Inside NVIDIA DGX Spark: Is DGX Spark Actually Blackwell?
By Jeongkyu Shin, Kyujin ChoNVIDIA DGX Spark brings 1 PFLOP GB10 performance to the desktop, but its SM12x GPU opens hidden compatibility gaps with the latest LLM kernels built for data center Blackwell (SM100).19 February 2026

Sokovan Orchestrator: Reliable session scheduling for Backend.AI
By HyeokJin KimHow Lablup's CoreDev team reworked the Sokovan Orchestrator so Backend.AI schedules sessions more efficiently as AI workloads change.8 December 2025

AAA: Agentic, Autonomous, Adaptive Intelligence - lab | up >/ conf/5 Keynote
By Jeongkyu ShinHighlights from the lab | up > /conf/5 keynote on Lablup's 10th anniversary: a vision to become an intelligence supplier that measures and affordably delivers intelligence, beyond Make AI Accessible and Make AI Scalable.2 December 2025

Training an MCP Sidecar Model to Boost LLM Versatility
By Junbum LeeHow we tackled open-source LLMs' weak support for MCP (Model Context Protocol): moving from supervised learning to reinforcement learning, and solving 128K-token context handling and JSON parsing.12 November 2025