Stage

Training

When the Spark trains from scratch or continues pre-training — and when it should not. On-device training economics for an individual.

Upcoming training NeMo Framework + Llama 3.1 8B planned ~2 days of wall-clock, one long weekend

Continued Pre-training on a DGX Spark — NeMo Framework Without a Cluster

When does it make sense to continue pre-training on a single GB10 box, and when is it a category error? A planned run that pushes NeMo Framework, Megatron-LM parallelism, and BF16 mixed precision against the 128 GB unified-memory wall with a small domain corpus.