Watching the GPU — DCGM, Prometheus, and a Local Grafana for the Spark

What this article will answer

The Spark’s unified memory is shared CPU+GPU, which means nvidia-smi alone doesn’t tell the whole story. This piece wires up a proper telemetry stack so the next OOM landmine gets caught by a dashboard, not a hard-hung box.

NVIDIA technologies to be covered

DCGM Exporter — the NVIDIA-maintained Prometheus exporter; which metrics are worth scraping on GB10, which are noise.
nvidia-smi dmon — fast sanity-check loop when Grafana is down.
Prometheus — local scrape config, retention sized for a personal rig.
Grafana dashboards — one pane for memory (unified budget + per-container attribution), one for SM occupancy, one for power and thermals.
Kernel-level hooks — where nvidia-smi stops being enough and Nsight Systems takes over (forward reference to the Nsight article).

What I expect to find

The interesting metric is not peak GPU utilization — it’s the shape of memory over time as NIMs warm up and pgvector indexes churn. The article closes with a screenshot of the dashboard the day the Spark hung on 2026-04-22, so the visual becomes a diagnostic template for next time.

Where it sits in the arc

Cross-cutting. Useful for all three application tracks (Second Brain, LLM Wiki, Autoresearch) — whichever is running at the moment, the dashboard is the same.