How Upstage, Lablup, and its consortium are powering a national frontier AI model with Backend.AI
Keeping large-scale training resilient: An end-to-end approach
“We spent our time on development—not infrastructure—and Backend.AI handled the rest”
— Kyle Yi, Consortium Lead, Upstage
Make training efficient with Backend.AI anywhere, anytime, at any scale
The Upstage consortium used Backend.AI as its infrastructure backbone to tackle the operational demands of a large-scale GPU cluster. Backend.AI orchestrated hundreds of GPUs across distributed nodes while preserving visibility and control over the entire training environment. It enabled the consortium to maintain continuous model-development cycles, significantly reducing the effort required for infrastructure maintenance.
Related Services
Backend.AI is a vendor-agnostic accelerated workload hosting platform based on our own home-grown orchestration and job scheduler, running on top of either cloud or on-premises (air-gapped) clusters.
Explore service →A MLOps pipeline platform for LLM fine-tuning and serving that simplifies the entire lifecycle of large language model customization. Prepare data, train models, validate performance, and deploy as a REST API—all managed within a single pipeline.
Explore service →