Stage
Observability
Knowing what is actually happening on your GPUs. Latency, memory, throughput — instrumentation for a rig you cannot stare at all day.
Ragas, Reranked — What 44 Held-Out Questions Say About the Second Brain Stack
A Ragas-style harness written in 200 lines of stdlib Python, run locally on the DGX Spark, against four variants of the Second Brain RAG chain. Naive RAG scores 3.30 / 5. Rerank RAG scores 4.27. LoRA+RAG is a surprise — it does not beat naive. Retrieval is where the points come from.
Watching the GPU — DCGM, Prometheus, and a Local Grafana for the Spark
A planned setup of DCGM Exporter → Prometheus → Grafana entirely on the Spark itself. The goal is a single dashboard that tells the truth about GPU memory, SM occupancy, and per-container utilization for a rig that's running NIMs, pgvector, and an occasional training job at the same time.