Unified GPU/NPU Monitoring for Enterprise AI Infrastructure

All-SMI is a system visibility tool that supports diverse AI accelerator hardware, including NVIDIA, AMD, Apple Silicon, Intel Gaudi, Google TPU, and more. Precisely monitor heterogeneous data centers through thermal monitoring of each node and chassis.

all-smi — Cluster Overview
Cluster Overview
Nodes200/200
Total RAM320.00TB
GPU Cores1600
Total VRAM220.31TB
Avg. Temp68°C
Total Power683.5kW
GPU Util
50.3%
GPU Mem
49.8%
Temp
68°C
Tabs node-0001 node-0002 node-0003 ... node-0200
GPUModelUtilVRAMTempPower
GPUH200 14179.1%84.2/141GB78°C511/700W
GPUH200 14128.8%90.0/141GB63°C420/700W
GPUH200 14136.1%61.6/141GB63°C385/700W
GPUH200 14166.2%69.3/141GB75°C502/700W
GPUModelUtilVRAMTempPower
GPUH200 14191.3%120.4/141GB82°C645/700W
GPUH200 14145.6%72.1/141GB59°C378/700W
GPUH200 14153.2%95.8/141GB67°C462/700W
GPUH200 14112.4%38.5/141GB51°C285/700W
h:Help q:Exit c:CPU →:Tabs s:Scroll

Key features

Control your entire data center from a single interface

Integrated solution for efficient data center monitoring

Monitor GPU, CPU, memory, and chassis thermal data from each node for precise control. View total nodes, GPU cores, total VRAM, average temperature, and total power consumption at a glance with Cluster Overview.

Unified metrics for simultaneous multi-platform operation

Manage 9+ platforms through a single UI including NVIDIA, AMD, Apple Silicon, Intel Gaudi, and Google TPU. No need to run separate monitoring tools for each vendor.

Resource visibility across your system

Simultaneously monitor 256+ remote systems, providing real-time GPU utilization, CPU load, system memory, disk usage, temperature, and power consumption. Exports 100+ Prometheus metrics for observability stack integration.

Enterprise-grade stable remote monitoring

Connection pooling, concurrent connection limits, automatic retry, and TCP Keep-alive provide stable monitoring even in large distributed environments.

Detailed process tracking to identify resource waste

Track GPU memory usage, CPU utilization, and process status to quickly diagnose performance bottlenecks. Verify per-PID which process occupies which GPU.

Intuitive interactive UI for rapid response

Intuitive color-coded accelerator and chassis status indicators with real-time graphs help operations teams make rapid decisions.

Why All-SMI

All-SMI sees every AI accelerator

Eliminates the hassle of running nvidia-smi, rocm-smi, and hl-smi separately for different accelerators, comparing outputs in different formats, and managing nodes individually.

Featurenvidia-smirocm-smihl-smiAll-SMI
NVIDIA GPU
AMD GPU
Intel Gaudi
Google TPU
Domestic NPU (Rebellions, FuriosaAI)
Remote cluster monitoring✓ 256+ nodes
Built-in Prometheus metrics✓ 100+ metrics
CPU / Memory / Chassis integration
Per-process GPU tracking✓ NVIDIA only✓ AMD only✓ All accelerators
Color-coded interactive UI

Supported accelerators

Broad AI accelerator hardware support

Supports GPU, NPU, and TPU alike. As new AI semiconductors arrive, All-SMI coverage grows with them.

NVIDIA
GPU
B200 · H200 · H100 · A100 · V100 · Jetson
AMD
GPU
MI300X · MI325X · Radeon Instinct
Intel
NPU
Gaudi 1 · Gaudi 2 · Gaudi 3 · PCIe · OAM · UBB
Google
TPU
v2 · v3 · v4 · v5e · v5p · v6 · v7 (Ironwood)
Apple
Silicon
M1 · M2 · M3 · M4 · M5
Tenstorrent
NPU
Grayskull · Wormhole · Blackhole
Rebellions
NPU
ATOM · ATOM+ · ATOM Max
FuriosaAI
NPU
Warboy · RNGD
and more
POSIX · OpenCL · SYCL …

Operating modes

Three operating modes

From single-node inspection to large-scale cluster monitoring to observability stack integration

MODE 01

Local

Terminal-based real-time monitoring. Instantly check GPU, CPU, and memory status of your local system. Think of it as nvidia-smi for all accelerators.

$ all-smi
MODE 02

API

Provides Prometheus-compatible metric endpoints. Connect to existing observability stacks like Grafana and Alertmanager.

$ all-smi api --port 9100
MODE 03

View

Remote cluster dashboard. View nodes running in API mode from a single screen.

$ all-smi view node1,node2,...

Quick Start

One-line install

Supports macOS, Linux, and Windows. Choose Homebrew, pip, APT, Cargo, or binary download.

Also available as a Rust crate for building custom monitoring applications.

Installation
# macOS / Linux (Homebrew) $ brew install lablup/tap/all-smi # pip (PyPI) $ pip install all-smi # Ubuntu (APT) $ sudo add-apt-repository ppa:lablup/all-smi $ sudo apt install all-smi # Cargo (Rust) $ cargo install all-smi # Run $ all-smi

Enterprise Products

Lablup enterprise products using All-SMI

Open-source All-SMI monitoring capabilities are embedded in Lablup's commercial products, providing consistent accelerator visibility from desktop to data center.

DESKTOP

Backend.AI:GO

Lightweight AI platform for desktop and AI PCs. Based on All-SMI's local monitoring engine, check GPU/NPU status of your personal workstation in real time.

  • All-SMI based local accelerator monitoring
  • Real-time NVIDIA, AMD, Apple Silicon status
  • GPU utilization, temperature, memory dashboard
  • Per-process GPU tracking
  • Free distribution
DATA CENTER

Backend.AI Monitoring Dashboard

Backend.AI web-based integrated monitoring dashboard. Based on All-SMI metric collection engine and Prometheus integration, visualize accelerator status across the entire cluster.

  • All-SMI + Prometheus metric pipeline
  • Heterogeneous accelerator cluster dashboard
  • Per-node / per-GPU utilization, temp, power
  • Grafana custom dashboard integration
  • Anomaly alerts and history tracking

All AI accelerators in one interface

All-SMI is enough for monitoring. If you need operations too, there's Backend.AI.

Install from GitHub

We're here for you!

Complete the form and we'll be in touch soon

Contact Us

Headquarter & HPC Lab

KR Office: 8F, 577, Seolleung-ro, Gangnam-gu, Seoul, Republic of Korea US Office: 3003 N First st, Suite 221, San Jose, CA 95134

© Lablup Inc. All rights reserved.

We value your privacy

We use cookies to enhance your browsing experience, analyze site traffic, and understand where our visitors are coming from. By clicking "Accept All", you consent to our use of cookies. Learn more