Engineering

Dec 8, 2025

Engineering

Sokovan Orchestrator: Reliable session scheduling for Backend.AI

  • HyeokJin Kim

    HyeokJin Kim

    CoreDev Lead

Dec 8, 2025

Engineering

Sokovan Orchestrator: Reliable session scheduling for Backend.AI

  • HyeokJin Kim

    HyeokJin Kim

    CoreDev Lead

Here's how Backend.AI's orchestration layer, Sokovan, solved the session scheduling problem.

Hello, I’m Hyeokjin Kim, a laed of CoreDev team, and software engineer at Lablup.

The CoreDev team develops the core backend components that power Backend.AI, including its orchestration engine, Sokovan. Our goal is to build a more stable and trustworthy Backend.AI to help our customers operate AI workloads reliably.

Backend.AI is an orchestration platform that manages AI computation sessions across multiple nodes. When a user requests a session, the system selects a suitable server, assigns resources, and creates a container. On top of this, the Model Serving layer distributes serving frameworks such as vLLM or TGI across nodes, scaling replicas up or down to handle incoming traffic efficiently.

Originally, the AgentRegistry was a monolithic component responsible for all of these tasks. While the early implementation worked, as AI workloads became more complex and session and model-serving scenarios diversified, several limitations began to surface. Because model services and regular sessions created and managed sessions differently, similar failures often occurred with inconsistent root causes and reproduction paths.

Although the monolithic design functioned, growing workload complexity revealed multiple architectural weaknesses. To address this, our CoreDev team decided to make major improvements to the Sokovan orchestrator and began a substantial refactoring project over the past few months.

The Problem: Monolithic Scheduler Bottlenecks

In the monolithic design, the call stack grew endlessly deep. The model service directly invoked session creation, and that call serially triggered other services. Because layer boundaries were unclear, it became increasingly difficult to trace where a failure occurred.

Each session request executed heavy scheduling logic immediately, and during traffic spikes, this caused sharp surges in server load. When a network or agent error occurred, sessions frequently remained stuck in the pending state.

Separating Responsibilities and Batch Processing with Sokovan

To solve these bottlenecks and instability issues, Sokovan divides responsibilities into smaller components and processes heavy workloads periodically in batches. The core of this design lies in the separation of roles between the Controller and the Coordinator. The Controller validates incoming API requests and marks their state in the database. The Coordinator and its underlying Scheduler/Engine then periodically collect and process pending jobs such as scheduling or resource allocation.

Sokovan also uses both short-term and long-term cycles—a dual-loop system—to react quickly to new events while guaranteeing that every job is eventually processed even if Redis or network issues occur.

Architecture

This system is composed of three layers:

Coordinator’s periodic operation:

┌─────────────────────────────────────────────────────────┐
│                  SokovanOrchestrator                    │
│                  (Top-level Coordinator)                │
│                                                         │
│     - Creates and manages 3 Coordinators                │
│     - Schedules periodic tasks                          │
│     - Provides common infrastructure (events, locks)    │
└─────────────────────┬───────────────────────────────────┘
                      │
        ┌─────────────┼─────────────┐
        ▼             ▼             ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│  Schedule   │ │ Deployment  │ │    Route    │
│ Coordinator │ │ Coordinator │ │ Coordinator │
│             │ │             │ │             │
│   Session   │ │ Deployment  │ │    Route    │
│  lifecycle  │ │  lifecycle  │ │ management  │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
       │               │               │
       ▼               ▼               ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│  Scheduler  │ │ Deployment  │ │    Route    │
│             │ │   Engine    │ │   Engine    │
│             │ │             │ │             │
│   Actual    │ │   Actual    │ │   Actual    │
│ scheduling  │ │ deployment  │ │  routing    │
│   logic     │ │   logic     │ │   logic     │
└─────────────┘ └─────────────┘ └─────────────┘

Data Flow

Sokovan separates the data flow into Controller and Coordinator paths.

Service Layer
    ↓
SessionSchedulingController.enqueue() / mark_terminate()
    ↓
1. The Controller validates parameters and session state.
    ↓
2. It marks the session state in the database (as PENDING or TERMINATING).
    ↓
3. It sets a Redis hint (SCHEDULE/TERMINATE).
    ↓
4. It returns an immediate response.

Coordinator flow (periodic batch processing):

A periodic trigger runs (every 2 or 60 seconds).
    ↓
The Coordinator checks Redis hints (processes them if present, otherwise skips or runs forced checks).
    ↓
It queries the database for sessions marked as PENDING or TERMINATING.
    ↓
It executes the scheduling or termination handlers.
    ↓
It sets post-process hints for the next step.
  • The Orchestrator creates three Coordinators and registers periodic tasks.
  • Each Coordinator decides when and what to process.
  • Controllers handle validation and marking only, while Schedulers and Engines perform the actual algorithms and resource allocations.

Model Service reuses the Session Scheduling Controller by calling only enqueue() and mark_terminate(). It doesn’t need to manage internal session flow directly. This unified approach simplifies maintenance and ensures consistent handling of both sessions and model services.

Core Concept: The Mark–Execute Pattern

The separation between Controller and Coordinator is built on the Mark–Execute pattern.

In the old design, AgentRegistry immediately executed the scheduling with each session request. Even though the Controller marked requests as pending in the DB, it still invoked the scheduling function right away. If 10 scheduling triggers arrived, the system executed the scheduling 10 separate times, which made it highly inefficient. Because scheduling involves collecting resource data from all agents and running expensive allocation computations, system load increased proportionally to the number of incoming requests. Network or agent failures easily led to “pending hell,” where sessions stayed stuck until the next scheduling attempt.

In Sokovan, scheduling runs periodically in batches. The Controller only validates input, marks sessions as PENDING, and sets hints in Redis. The Coordinator then processes all pending sessions at once every 2 or 60 seconds. When 10 requests arrive at once, the scheduler runs only once, and failed attempts automatically retry in the next cycle.

The same logic applies to termination requests. Instead of communicating directly with the agent during the API call, Sokovan marks the termination and responds immediately. The Coordinator later processes these marked sessions. This method improves resilience, as temporary network issues or failures no longer block request responses, and server load remains stable even under heavy traffic.

Design Considerations

  • Use the database as the single source of truth: The DB stores all session states, and Sokovan operates on a state-machine basis. As long as the DB remains intact, the scheduler can safely retry any operation.
  • Use Redis hints for responsiveness: Redis provides lightweight hints to trigger quick processing. Even if Redis misses an event, the long-cycle execution ensures recovery.

Dual Loop Structure Based on Hints

How frequently should the Coordinator check for tasks? If it checks too often, responsiveness improves but resource usage rises. If it checks too rarely, system load decreases but response latency increases. To balance these trade-offs, Sokovan uses a dual-loop design with two cycles: a short cycle (2 seconds) and a long cycle (60 seconds).

The short cycle checks Redis hints every 2 seconds. When a Controller sets a hint after a new session or termination request, the Coordinator picks it up almost instantly. If no hint exists, it skips execution without DB queries, keeping the system lightweight.

The long cycle always executes every 60 seconds, regardless of hints, ensuring all pending or terminated sessions are processed. Even if Redis fails, all operations complete within a minute.

This dual-loop mechanism enables average processing within 2 seconds after new events while keeping CPU and DB usage minimal. If a hint is missed, the long-cycle run ensures eventual progress within the next minute.

Example timeline:

0s    [Short] No hint → Skip
2s    [Short] No hint → Skip
3s    User requests session → Controller sets Redis hint
4s    [Short] Hint found → Process scheduling
6s    [Short] No hint → Skip
60s   [Long] Always runs → Ensures scheduling

Handler Registry

Sessions and model services transition through multiple states:

Session: PENDING → SCHEDULED → PREPARING → CREATING → RUNNING → TERMINATING → TERMINATED
Model Serving: PENDING → PROVISIONING → RUNNING → DESTROYING → TERMINATED

Sokovan creates independent Handlers for each state and task type, and the Coordinator selects the proper one to execute.

Each Handler has four responsibilities:

name() – returns the handler name for metrics.

lock_id – defines the distributed lock key if needed.

execute() – performs the main logic.

post_process() – triggers the next step.

The Coordinator picks a handler based on the job type, locks if required, executes it, and calls post_process() to emit new hints or broadcast events.

Each handler performs one specific job. For instance, ScheduleSessionsHandler runs only scheduling logic, while CreateSessionsHandler handles session creation. Handlers are linked through Redis hints—for example, after scheduling finishes, the next handler sets a new hint for the precondition check step, forming a clear and reactive workflow chain.

Stability and Resilience

Sokovan aims above all to ensure that the server remains stable. Previously, heavy traffic often caused scheduling delays or failures. With the latest updates, Sokovan maintains consistent system load even under spikes by batch processing all tasks. It groups broadcast events and cache invalidations to minimize random fluctuations and make performance predictable.

Distributed locks solve concurrency issues. Before executing, the Coordinator acquires the designated lock (e.g., LOCKID_SOKOVAN_TARGET_PENDING) so that multiple manager processes cannot schedule the same session concurrently.

Because Sokovan checks only Redis when idle, system load remains extremely low, while the dual-loop mechanism guarantees responsive and fault-tolerant operation when events occur.

Conclusion

Sokovan strengthens stability and fault tolerance through the Mark–Execute pattern, dual loops, and a Handler Registry. Using batch processing and distributed locks, it safely coordinates concurrent operations so no two managers handle the same session simultaneously. When no events occur, it performs only lightweight checks, keeping CPU and DB utilization steady.

As a system’s scale increases, enforcing clear boundaries—Controller as trigger, Coordinator as orchestrator, and Handler as executor—prevents errors and simplifies testing. Developers can test each component independently and confirm workflow logic more easily.

While small-scale systems may benefit from simpler designs, adopting these structural constraints becomes valuable as repeating patterns emerge. Though splitting logic into multiple layers adds complexity, it prevents mistakes such as putting heavy logic into the wrong layer. This structural separation improves code safety, maintainability, and testability.

The same pattern applies beyond Sokovan. Any system facing complex traffic and state management challenges can use these architectural ideas to achieve stability and scalability.


Sources

Sourcecode: Backend.AI GitHub - Sokovan Architecture documentation: src/ai/backend/manager/sokovan/README.md

We're here for you!

Complete the form and we'll be in touch soon

Contact Us

Headquarter & HPC Lab

KR Office: 8F, 577, Seolleung-ro, Gangnam-gu, Seoul, Republic of Korea US Office: 3003 N First st, Suite 221, San Jose, CA 95134

© Lablup Inc. All rights reserved.

We value your privacy

We use cookies to enhance your browsing experience, analyze site traffic, and understand where our visitors are coming from. By clicking "Accept All", you consent to our use of cookies. Learn more