Stage

Fine-tuning

Adapting open-weight models to your own data on a single machine. NeMo, LoRA, full-parameter — what each is for and what each demands.

Article №14 foundations Foundation ~25 minute read

Looking Beyond Spark — Fine-Tuning a 100B Nemotron

A working answer to: how many GPUs to fine-tune a 100B Nemotron? Three methods, three memory footprints — full FT ≈ 1.6 TB needs 24× H100; LoRA ≈ 250 GB fits 8× H100; QLoRA ≈ 65 GB fits 1× H200. The Spark's 3B LoRA teaches the math.

Article №13 observability NeMo Evaluator ~60 minutes end-to-end — 40 s to ingest the blog into pgvector, 2 min for retrieval, 4 min for generation across three 8B variants, 90 s for the LoRA variant, 9 min for grading

Ragas, Reranked — What 44 Held-Out Questions Say About the Second Brain Stack

A Ragas-style harness written in 200 lines of stdlib Python, run locally on the DGX Spark, against four variants of the Second Brain RAG chain. Naive RAG scores 3.30 / 5. Rerank RAG scores 4.27. LoRA+RAG is a surprise — it does not beat naive. Retrieval is where the points come from.

Article №12 fine-tuning Hugging Face PEFT + Qwen2.5-3B-Instruct ~45 minutes end-to-end — 5 min corpus via NIM 8B, 69 s training, 3 min benchmark, plus a 6 GB base-model download

LoRA on Your Own Q&A — What 231 Pairs Actually Teach a 3B Model

231 own-voice Q&A pairs, a rank-16 LoRA, 69 s of training on a GB10 Spark. The adapter won't memorize your exact numbers, but it will take a model that refuses 61% of questions about your work and turn it into one that answers all of them in your voice. For facts you still need RAG.

Upcoming fine-tuning NeMo Customizer + Nemotron Nano 9B v2 planned ~4 hours per sweep

LoRA on Nemotron Nano — Fine-tuning a 9B Without Blowing Unified Memory

A planned walk through LoRA fine-tuning on Nemotron Nano 9B with NeMo Customizer: rank and alpha sweeps, a tiny domain corpus, and the memory accounting that keeps a PEFT run from tripping the Spark's 128 GB unified-memory wall.