Stage
Dev-tools
Nsight, CUDA tooling, editor integrations. The things that shorten the distance between an idea and its first running version.
Access First, Models Second — How I Set Up My DGX Spark for Solo AI Work
Most DGX Spark walkthroughs open with CUDA and tokens/sec. This one opens with streaming, AI-pair-programming, sandboxed agents, and browser automation — the access layer. For a solo edge builder, that interaction stack is more load-bearing than the model stack.
Tracing a NIM Request with Nsight Systems — What the 24.8 tok/s Number Hides
A planned kernel-level trace of a single NIM inference request on GB10. Where does the wall-clock time actually go — tokenization, KV-cache attention, the sampling loop, memcpy? The article turns 24.8 tokens per second into a timeline you can point at and say 'that line is the bottleneck'.