How Upstage, Lablup, and its consortium are powering a national frontier AI model with Backend.AI
Keeping large-scale training resilient: An end-to-end approach
“We spent our time on development—not infrastructure—and Backend.AI handled the rest”
Make training efficient with Backend.AI anywhere, anytime, at any scale.
The Upstage consortium used Backend.AI as its infrastructure backbone to tackle the operational demands of a large-scale GPU cluster. Backend.AI orchestrated hundreds of GPUs across distributed nodes while preserving visibility and control over the entire training environment. It enabled the consortium to maintain continuous model-development cycles, significantly reducing the effort required for infrastructure maintenance.
Related Services
Backend.AI is a vendor-agnostic accelerated workload hosting platform based on our own home-grown orchestration and job scheduler, running on top of either cloud or on-premises (air-gapped) clusters.
Explore service →A MLOps pipeline platform for LLM fine-tuning and serving that simplifies the entire lifecycle of large language model customization. Prepare data, train models, validate performance, and deploy as a REST API—all managed within a single pipeline.
Explore service →