Blogs on ArkNill

Blogs on ArkNillhttps://arknill.github.io/blog/Recent content in Blogs on ArkNillHugoenMon, 27 Apr 2026 00:00:00 +0000Testing Claude Code Against Local 35B Models: Building a Cross-Check Harnesshttps://arknill.github.io/blog/testing-claude-code-against-local-35b/Mon, 27 Apr 2026 00:00:00 +0000https://arknill.github.io/blog/testing-claude-code-against-local-35b/I built three benchmark harnesses to compare Claude Code and Codex against local Qwen 35B models. The harness had more bugs than the models. Here's the v1→v7 journey — 55 tasks, 290 tests, and what 'ALL_FAIL 7→0' taught me about evaluation.I Built a 3-Node Home LLM Lab. Here's What It Actually Takes.https://arknill.github.io/blog/3-node-home-llm-lab/Sun, 26 Apr 2026 00:00:00 +0000https://arknill.github.io/blog/3-node-home-llm-lab/Two DGX Sparks (128GB each) and an RTX 5090 desktop — running Qwen 3.5/3.6 35B in production. Hardware choices, real costs, operational lessons, and why three nodes beat one big one.Quantization, Determinism, and Thinking Tokens: Running Open-Source LLMs in Productionhttps://arknill.github.io/blog/quantization-determinism-thinking-production/Sat, 25 Apr 2026 00:00:00 +0000https://arknill.github.io/blog/quantization-determinism-thinking-production/FP8 is the production floor. Q4 MoE loses 16% on CJK. vLLM is non-deterministic under MTP. Thinking tokens eat 90% of your budget on the wrong tasks. Hard lessons from operating Qwen 3.5/3.6 35B across 3 nodes.What tok/s Doesn't Tell You: Measuring LLM Speed That Mattershttps://arknill.github.io/blog/what-tok-s-doesnt-tell-you/Fri, 24 Apr 2026 00:00:00 +0000https://arknill.github.io/blog/what-tok-s-doesnt-tell-you/My 204 tok/s GPU feels slower than a 65 tok/s one for some tasks. tok/s alone is a misleading metric — here's a framework (TTR, Effective tok/s, TCT) that measures what users actually experience.Anthropic's Postmortem Told Half the Truthhttps://arknill.github.io/blog/anthropic-postmortem-half-truth/Thu, 23 Apr 2026 00:00:00 +0000https://arknill.github.io/blog/anthropic-postmortem-half-truth/Anthropic admitted 3 harness bugs caused Claude Code degradation. The bugs are real. But the postmortem strategically scoped out model-level regressions, ignored 9+ open issues, and all three 'bugs' happen to reduce Anthropic's serving costs. Here's what the other half looks like.Opus 4.7 Postmortem: What the Changelog Didn't Sayhttps://arknill.github.io/blog/opus-47-postmortem-what-changelog-didnt-say/Wed, 22 Apr 2026 00:00:00 +0000https://arknill.github.io/blog/opus-47-postmortem-what-changelog-didnt-say/Anthropic admitted three product-layer bugs that degraded Claude Code for 48 days. Cross-checking the postmortem against the CHANGELOG reveals a structural transparency gap — 2 of 3 bugs had zero documentation. And 5 new issues persist beyond the postmortem's scope.I Tracked 42,363 Claude Code API Calls. Here's Where Your Quota Actually Goes.https://arknill.github.io/blog/claude-code-thinking-token-blind-spot/Mon, 06 Apr 2026 00:00:00 +0000https://arknill.github.io/blog/claude-code-thinking-token-blind-spot/19 days of transparent proxy data on Claude Code Max 20 — token breakdown, 11 bugs found, Opus 4.7 impact, and Anthropic's April 23 postmortem. Independent datasets from other researchers confirmed the pattern and corrected my original hypothesis.