News

News about Lablup Inc and Backend.AI

Jun 12, 2026

News

Lablup adds Intel Arc Pro B70 support to Backend.AI

  • Lablup

    Lablup

    Lablup

Jun 12, 2026

News

Lablup adds Intel Arc Pro B70 support to Backend.AI

  • Lablup

    Lablup

    Lablup

Lablup expands coverage to Intel Arc lineups

Backend.AI now officially supports the Intel Arc Pro B70 workstation GPU. With this addition, Lablup extends its Intel coverage beyond the Gaudi 2/3 AI accelerators it has supported to date and into the Arc graphics lineup. From Gaudi in the datacenter to Arc Pro on the workstation, the full range of Intel AI silicon can now be connected and managed through Backend.AI alone.

From Desktop to Datacenter, on a Single Platform

Backend.AI supports a wide range of GPUs and AI accelerators on the market to help you get the most out of your AI business, along with an intuitive user interface that makes them convenient to use. Customers can build, train, and serve AI models effectively regardless of size, from the smallest language models to large language models, which sharply reduces the cost and complexity of developing and operating services. By drawing out the full capability of generative AI and accelerated computing, Backend.AI is becoming the key to transforming your business with state-of-the-art technology.

Lablup builds its software so that scientists, researchers, DevOps teams, enterprises, and AI enthusiasts can use AI services in an efficient, scalable form. We work closely with Intel to support the success of the generative AI and deep learning services in wide use today. By adding Intel XPUs to its list of officially supported devices, Lablup has extended Backend.AI's management coverage to workstation cards. A single B70 in a workstation, a shared cluster built from several of them, and Gaudi in the datacenter can all be operated on the same platform.

To learn more about Backend.AI®, visit backend.ai.

About the Intel Arc Pro B70

In agentic AI workloads, sustained throughput across many concurrently running agents is becoming more important than the latency of a single request. In scenarios that hold long contexts, such as coding agents, the KV cache quickly takes over GPU memory, and in production environments dozens or more of these agents run at the same time. When memory runs short, the engine falls into repeated cache eviction and recomputation, and throughput drops sharply.

The Intel Arc Pro B70 is a workstation GPU built on the Xe2 (codename Battlemage) architecture and designed to relieve this bottleneck, offering 32 GB of GDDR6 ECC memory, 608 GB/s of memory bandwidth, and up to 367 TOPS of INT8 compute. The 32 GB configuration in particular provides more headroom than comparable workstation GPUs, extending how long the KV cache can be retained and absorbing higher concurrency reliably.

These characteristics pay off even more when combined with Backend.AI's session-based isolation. Each agent runs in its own independent session while GPU memory is distributed efficiently across them, so even a single workstation node can operate AI workloads at high density. The B70 is a practical option for bringing datacenter-grade operating practices to a personal workstation or a small team environment.


Full per-model figures, including throughput trends across concurrency levels, KV cache capacity comparisons, and the measurement methodology, are available in the solution brief Intel Arc Meets Backend.AI: What the Arc Pro B70's 32 GB of Memory Buys for Agentic AI.

Benchmark environment

  • Cards: Intel Arc Pro B70 (32 GB GDDR6) vs NVIDIA RTX PRO 4000 Blackwell (24 GB GDDR7 ECC).
  • Node: Intel Xeon w9-3475X, Ubuntu 25.10. B70 node: 96 GB RAM; RTX PRO 4000 Blackwell node: 256 GB RAM. One GPU per run, tensor parallelism of 1.
  • Tooling: vLLM bench sweep, request rate inf (all prompts submitted at once; concurrency equals the number of prompts).
  • Models: GPT-OSS 20B, Qwen3 4B Instruct 2507, Qwen3 8B. The KV cache comparison also covers Gemma-4-E4B-it, Qwen3.5-9B, and Gemma-3n-E4B-it.
  • Metrics: output throughput (tokens/s); per-user throughput is output throughput divided by the number of concurrent requests.
  • Benchmark data: Intel and Lablup, April 2026.

We're here for you!

Complete the form and we'll be in touch soon

Contact Us

Headquarter & HPC Lab

KR Office: 8F, 577, Seolleung-ro, Gangnam-gu, Seoul, 06143, Republic of Korea US Office: 3003 N First st, Suite 221, San Jose, CA 95134

© Lablup Inc. All rights reserved.

We value your privacy

We use cookies to enhance your browsing experience, analyze site traffic, and understand where our visitors are coming from. By clicking "Accept All", you consent to our use of cookies. Learn more