Homelab

I run a 3-node local LLM inference cluster at home. Two NVIDIA DGX Sparks (128GB unified memory each) and one RTX 5090 desktop (32GB VRAM). All three serve Qwen 3.5/3.6 35B MoE models 24/7 over my local network. This isn’t a weekend experiment — it’s my daily development infrastructure. Every code review, every research query, every benchmark runs against these nodes. Here’s what the setup looks like, what it costs, and what I learned that no spec sheet tells you. ...