Case study

How Upstage, Lablup, and its consortium are powering a national frontier AI model with Backend.AI

Keeping large-scale training resilient: An end-to-end approach

“We spent our time on development—not infrastructure—and Backend.AI handled the rest”

— Kyle Yi, Consortium Lead, Upstage

Download Resource

Please fill out the form below.

Make training efficient with Backend.AI anywhere, anytime, at any scale

The Upstage consortium used Backend.AI as its infrastructure backbone to tackle the operational demands of a large-scale GPU cluster. Backend.AI orchestrated hundreds of GPUs across distributed nodes while preserving visibility and control over the entire training environment. It enabled the consortium to maintain continuous model-development cycles, significantly reducing the effort required for infrastructure maintenance.

Related Services

Backend.AI is a vendor-agnostic accelerated workload hosting platform based on our own home-grown orchestration and job scheduler, running on top of either cloud or on-premises (air-gapped) clusters.

Explore service →

A MLOps pipeline platform for LLM fine-tuning and serving that simplifies the entire lifecycle of large language model customization. Prepare data, train models, validate performance, and deploy as a REST API—all managed within a single pipeline.

Explore service →

backend.ai

How Upstage, Lablup, and its consortium are powering a national frontier AI model with Backend.AI

Download Resource

Make training efficient with Backend.AI anywhere, anytime, at any scale

Related Services

We value your privacy