Stage

Observability

Knowing what is actually happening on your GPUs. Latency, memory, throughput — instrumentation for a rig you cannot stare at all day.

Article №13 observability NeMo Evaluator ~60 minutes end-to-end — 40 s to ingest the blog into pgvector, 2 min for retrieval, 4 min for generation across three 8B variants, 90 s for the LoRA variant, 9 min for grading

Ragas, Reranked — What 44 Held-Out Questions Say About the Second Brain Stack

A Ragas-style harness written in 200 lines of stdlib Python, run locally on the DGX Spark, against four variants of the Second Brain RAG chain. Naive RAG scores 3.30 / 5. Rerank RAG scores 4.27. LoRA+RAG is a surprise — it does not beat naive. Retrieval is where the points come from.

Upcoming observability NVIDIA DCGM + Prometheus + Grafana planned ~3 hours, mostly dashboard tuning

Watching the GPU — DCGM, Prometheus, and a Local Grafana for the Spark

A planned setup of DCGM Exporter → Prometheus → Grafana entirely on the Spark itself. The goal is a single dashboard that tells the truth about GPU memory, SM occupancy, and per-container utilization for a rig that's running NIMs, pgvector, and an occasional training job at the same time.