Benchmarks

I run Qwen 3.6 35B on three machines. The RTX 5090 generates at 204 tok/s. The DGX Spark pair generates at 65 tok/s. By every benchmark leaderboard metric, the 5090 is 3x faster. But for multi-step coding tasks with thinking enabled, the DGX pair completes the job faster. And for single-turn questions, the 5090 delivers the answer in under 2 seconds while the DGX takes 8–12 seconds. tok/s alone told me nothing useful about actual user experience. Here’s what I learned building benchmarks for all three nodes. ...

Benchmarks

Testing Claude Code Against Local 35B Models: Building a Cross-Check Harness

What tok/s Doesn't Tell You: Measuring LLM Speed That Matters