Testing Claude Code Against Local 35B Models: Building a Cross-Check Harness

I run Claude Code (Opus 4.6) as my primary coding tool and pay $200/month for it. I also run Qwen 3.5/3.6 35B locally on two DGX Sparks and an RTX 5090. Natural question: how does a local 35B model compare to the commercial tool I’m paying for? To find out, I built three separate benchmark harnesses over 10 days. The journey taught me more about evaluation methodology than about the models themselves — because the harness had more bugs than the models did. ...

April 27, 2026 · 6 min · ArkNill

Anthropic's Postmortem Told Half the Truth

On April 23, Anthropic published a postmortem acknowledging three product-layer bugs that degraded Claude Code from March 4 through April 20. They frame it as: model weights unchanged, harness bugs fixed, problem solved. The three bugs are real. Their impact was real. The fixes were real. But the postmortem is a carefully scoped document that tells half the truth. Here’s the other half. What They Admitted (Correctly) Bug Introduced Fixed Duration Effort high → medium March 4 (v2.1.68) April 21 (v2.1.117) 48 days Thinking cache clear every turn March 26 (v2.1.85) April 10 (v2.1.101) 15 days “≤25 words” system prompt April 16 (v2.1.111) April 20 (v2.1.116) 4 days All three are product-layer issues — API parameters and system prompts, not model weights. The combined effect: effort reduced thinking depth, thinking cache destroyed session memory, word limit truncated output. Together, these made Claude appear significantly dumber. ...

April 23, 2026 · 8 min · ArkNill

Opus 4.7 Postmortem: What the Changelog Didn't Say

On April 23, Anthropic published a postmortem acknowledging three product-layer bugs that degraded Claude Code from March 4 through April 20. No model weights were changed — all three issues were in the harness/product layer. I cross-checked every claim against the public CHANGELOG (3,285 lines, v2.1.68–v2.1.119), 8 GitHub issues via gh issue view, and 10 external sources. 36 claims checked — 28 confirmed, 5 partially confirmed, 3 not relied upon. Here’s what the postmortem says, what the CHANGELOG actually shows, and what still isn’t fixed. ...

April 22, 2026 · 5 min · ArkNill

I Tracked 42,363 Claude Code API Calls. Here's Where Your Quota Actually Goes.

I pay $200/month for Claude Code Max 20. On April 1, my quota hit 100% in 70 minutes during normal coding. That turned out to be two cache bugs — Anthropic fixed them in v2.1.90–91, and it made a real difference. But even after the fix, I wanted to understand where the quota actually goes. So I filed issues, dug into community threads, and built a transparent proxy to measure every API call. ...

April 6, 2026 · 8 min · ArkNill