Local-Llm on ArkNill

Local-Llm on ArkNillhttps://arknill.github.io/tags/local-llm/Recent content in Local-Llm on ArkNillHugoenMon, 27 Apr 2026 00:00:00 +0000Testing Claude Code Against Local 35B Models: Building a Cross-Check Harnesshttps://arknill.github.io/blog/testing-claude-code-against-local-35b/Mon, 27 Apr 2026 00:00:00 +0000https://arknill.github.io/blog/testing-claude-code-against-local-35b/I built three benchmark harnesses to compare Claude Code and Codex against local Qwen 35B models. The harness had more bugs than the models. Here's the v1→v7 journey — 55 tasks, 290 tests, and what 'ALL_FAIL 7→0' taught me about evaluation.I Built a 3-Node Home LLM Lab. Here's What It Actually Takes.https://arknill.github.io/blog/3-node-home-llm-lab/Sun, 26 Apr 2026 00:00:00 +0000https://arknill.github.io/blog/3-node-home-llm-lab/Two DGX Sparks (128GB each) and an RTX 5090 desktop — running Qwen 3.5/3.6 35B in production. Hardware choices, real costs, operational lessons, and why three nodes beat one big one.Quantization, Determinism, and Thinking Tokens: Running Open-Source LLMs in Productionhttps://arknill.github.io/blog/quantization-determinism-thinking-production/Sat, 25 Apr 2026 00:00:00 +0000https://arknill.github.io/blog/quantization-determinism-thinking-production/FP8 is the production floor. Q4 MoE loses 16% on CJK. vLLM is non-deterministic under MTP. Thinking tokens eat 90% of your budget on the wrong tasks. Hard lessons from operating Qwen 3.5/3.6 35B across 3 nodes.What tok/s Doesn't Tell You: Measuring LLM Speed That Mattershttps://arknill.github.io/blog/what-tok-s-doesnt-tell-you/Fri, 24 Apr 2026 00:00:00 +0000https://arknill.github.io/blog/what-tok-s-doesnt-tell-you/My 204 tok/s GPU feels slower than a 65 tok/s one for some tasks. tok/s alone is a misleading metric — here's a framework (TTR, Effective tok/s, TCT) that measures what users actually experience.