Unified GPU/NPU Monitoring for Enterprise AI Infrastructure
All-SMI is a system visibility tool that supports diverse AI accelerator hardware, including NVIDIA, AMD, Apple Silicon, Intel Gaudi, Google TPU, and more. Precisely monitor heterogeneous data centers through thermal monitoring of each node and chassis.
| GPU | Model | Util | VRAM | Temp | Power |
|---|---|---|---|---|---|
| GPU | H200 141 | 79.1% | 84.2/141GB | 78°C | 511/700W |
| GPU | H200 141 | 28.8% | 90.0/141GB | 63°C | 420/700W |
| GPU | H200 141 | 36.1% | 61.6/141GB | 63°C | 385/700W |
| GPU | H200 141 | 66.2% | 69.3/141GB | 75°C | 502/700W |
| GPU | Model | Util | VRAM | Temp | Power |
|---|---|---|---|---|---|
| GPU | H200 141 | 91.3% | 120.4/141GB | 82°C | 645/700W |
| GPU | H200 141 | 45.6% | 72.1/141GB | 59°C | 378/700W |
| GPU | H200 141 | 53.2% | 95.8/141GB | 67°C | 462/700W |
| GPU | H200 141 | 12.4% | 38.5/141GB | 51°C | 285/700W |
Key features
Control your entire data center from a single interface
Integrated solution for efficient data center monitoring
Monitor GPU, CPU, memory, and chassis thermal data from each node for precise control. View total nodes, GPU cores, total VRAM, average temperature, and total power consumption at a glance with Cluster Overview.
Unified metrics for simultaneous multi-platform operation
Manage 9+ platforms through a single UI including NVIDIA, AMD, Apple Silicon, Intel Gaudi, and Google TPU. No need to run separate monitoring tools for each vendor.
Resource visibility across your system
Simultaneously monitor 256+ remote systems, providing real-time GPU utilization, CPU load, system memory, disk usage, temperature, and power consumption. Exports 100+ Prometheus metrics for observability stack integration.
Enterprise-grade stable remote monitoring
Connection pooling, concurrent connection limits, automatic retry, and TCP Keep-alive provide stable monitoring even in large distributed environments.
Detailed process tracking to identify resource waste
Track GPU memory usage, CPU utilization, and process status to quickly diagnose performance bottlenecks. Verify per-PID which process occupies which GPU.
Intuitive interactive UI for rapid response
Intuitive color-coded accelerator and chassis status indicators with real-time graphs help operations teams make rapid decisions.
Why All-SMI
All-SMI sees every AI accelerator
Eliminates the hassle of running nvidia-smi, rocm-smi, and hl-smi separately for different accelerators, comparing outputs in different formats, and managing nodes individually.
| Feature | nvidia-smi | rocm-smi | hl-smi | All-SMI |
|---|---|---|---|---|
| NVIDIA GPU | ✓ | — | — | ✓ |
| AMD GPU | — | ✓ | — | ✓ |
| Intel Gaudi | — | — | ✓ | ✓ |
| Google TPU | — | — | — | ✓ |
| Domestic NPU (Rebellions, FuriosaAI) | — | — | — | ✓ |
| Remote cluster monitoring | — | — | — | ✓ 256+ nodes |
| Built-in Prometheus metrics | — | — | — | ✓ 100+ metrics |
| CPU / Memory / Chassis integration | — | — | — | ✓ |
| Per-process GPU tracking | ✓ NVIDIA only | ✓ AMD only | — | ✓ All accelerators |
| Color-coded interactive UI | — | — | — | ✓ |
Supported accelerators
Broad AI accelerator hardware support
Supports GPU, NPU, and TPU alike. As new AI semiconductors arrive, All-SMI coverage grows with them.
Operating modes
Three operating modes
From single-node inspection to large-scale cluster monitoring to observability stack integration
Local
Terminal-based real-time monitoring. Instantly check GPU, CPU, and memory status of your local system. Think of it as nvidia-smi for all accelerators.
API
Provides Prometheus-compatible metric endpoints. Connect to existing observability stacks like Grafana and Alertmanager.
View
Remote cluster dashboard. View nodes running in API mode from a single screen.
Quick Start
One-line install
Supports macOS, Linux, and Windows. Choose Homebrew, pip, APT, Cargo, or binary download.
Also available as a Rust crate for building custom monitoring applications.
Enterprise Products
Lablup enterprise products using All-SMI
Open-source All-SMI monitoring capabilities are embedded in Lablup's commercial products, providing consistent accelerator visibility from desktop to data center.
Backend.AI:GO
Lightweight AI platform for desktop and AI PCs. Based on All-SMI's local monitoring engine, check GPU/NPU status of your personal workstation in real time.
- All-SMI based local accelerator monitoring
- Real-time NVIDIA, AMD, Apple Silicon status
- GPU utilization, temperature, memory dashboard
- Per-process GPU tracking
- Free distribution
Backend.AI Monitoring Dashboard
Backend.AI web-based integrated monitoring dashboard. Based on All-SMI metric collection engine and Prometheus integration, visualize accelerator status across the entire cluster.
- All-SMI + Prometheus metric pipeline
- Heterogeneous accelerator cluster dashboard
- Per-node / per-GPU utilization, temp, power
- Grafana custom dashboard integration
- Anomaly alerts and history tracking
All AI accelerators in one interface
All-SMI is enough for monitoring. If you need operations too, there's Backend.AI.