Storage-aware Platform
Any storage, one interface
Backend.AI's Storage Proxy optimizes I/O between containers and storage and abstracts diverse storage backends into a single interface, providing them as virtual folders (VFolder). With NVIDIA GPUDirect Storage, data transfers directly from storage to GPU memory, fundamentally eliminating I/O bottlenecks.
The problem
Is GPU performance bottlenecked by storage?
GPU performance doubles every year, but storage I/O bandwidth can't keep pace.
GPU idle cost
Even while GPUs wait for data due to I/O bottlenecks, hourly costs keep accumulating. As storage delays add up, actual compute efficiency relative to investment can deteriorate.
CPU bottleneck
Inefficient data flows routed through the CPU can become a bottleneck that degrades high-performance AI workload throughput.
Storage fragmentation
Organizations operate diverse commercial and open-source storage: VAST, PureStorage, WEKA, NetApp, and more. If an AI platform only supports specific storage, data silos emerge.
Storage, by design
A dedicated lane for your data
Backend.AI is designed with a 3-Plane Architecture that clearly separates the Control Plane (management), Compute Plane (compute), and Storage Plane (I/O). Storage Proxy transparently abstracts the type and location of physical storage and optimizes I/O between containers and storage. It also ensures data isolation between tenants through random UUID-based namespaces.
NVIDIA GPUDirect Storage
Skip the CPU, direct from storage to GPU.
The world's first containerized support for NVIDIA Magnum IO GPUDirect Storage.
Legacy I/O path
NVIDIA GPUDirect Storage Path
Benchmark
NVIDIA GPUDirect Storage + Dell PowerScale
Using a storage system that supports NVIDIA GPUDirect Storage along with management software that supports it can accelerate storage I/O performance. Dell PowerScale is a solution that supports NVIDIA GPUDirect Storage, delivering the best performance when used together with Backend.AI.
View Dell PowerScale Case Study→I/O Performance (Without vs With GDS)
Closer means faster
NUMA-Aware Scheduling
In multi-socket servers, each CPU socket has different local memory regions (NUMA). When a GPU accesses remote memory, latency increases significantly compared to local memory.
Backend.AI's Sokovan scheduler is NUMA topology-aware, placing workloads to use memory and NICs on the same NUMA node as the GPU. RDMA and InfiniBand paths are also optimized for NUMA topology, eliminating unnecessary remote memory access across the entire data path from storage to GPU.
Partnership
Faster paths, built together
Backend.AI partners with storage and accelerator vendors to optimize the data path for AI. Together, we co-engineer solutions that maximize throughput and minimize latency.
VAST Data
Since 2025, as a Technology Partner in the VAST COSMOS program, we collaborate on KV cache offloading, VAST AI OS integration, and other technologies to support customer inference workloads.
Dell Technologies
Since 2024, as a Dell Telecom AI self-certified partner, we collaborate to integrate Dell products including Dell PowerScale with Backend.AI to deliver solutions to customers.
PureStorage
Technology Alliance Partners since 2021. Integrating FlashArray and FlashBlade to deliver high-performance AI training and inference storage for customers.
Other Supported Storage
Experience AI infrastructure with integrated storage
See how Backend.AI resolves I/O bottlenecks in your existing storage environment through a demo.