<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Blogs on ArkNill</title><link>https://arknill.github.io/blog/</link><description>Recent content in Blogs on ArkNill</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 27 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://arknill.github.io/blog/index.xml" rel="self" type="application/rss+xml"/><item><title>Testing Claude Code Against Local 35B Models: Building a Cross-Check Harness</title><link>https://arknill.github.io/blog/testing-claude-code-against-local-35b/</link><pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate><guid>https://arknill.github.io/blog/testing-claude-code-against-local-35b/</guid><description>I built three benchmark harnesses to compare Claude Code and Codex against local Qwen 35B models. The harness had more bugs than the models. Here&amp;#39;s the v1→v7 journey — 55 tasks, 290 tests, and what &amp;#39;ALL_FAIL 7→0&amp;#39; taught me about evaluation.</description></item><item><title>I Built a 3-Node Home LLM Lab. Here's What It Actually Takes.</title><link>https://arknill.github.io/blog/3-node-home-llm-lab/</link><pubDate>Sun, 26 Apr 2026 00:00:00 +0000</pubDate><guid>https://arknill.github.io/blog/3-node-home-llm-lab/</guid><description>Two DGX Sparks (128GB each) and an RTX 5090 desktop — running Qwen 3.5/3.6 35B in production. Hardware choices, real costs, operational lessons, and why three nodes beat one big one.</description></item><item><title>Quantization, Determinism, and Thinking Tokens: Running Open-Source LLMs in Production</title><link>https://arknill.github.io/blog/quantization-determinism-thinking-production/</link><pubDate>Sat, 25 Apr 2026 00:00:00 +0000</pubDate><guid>https://arknill.github.io/blog/quantization-determinism-thinking-production/</guid><description>FP8 is the production floor. Q4 MoE loses 16% on CJK. vLLM is non-deterministic under MTP. Thinking tokens eat 90% of your budget on the wrong tasks. Hard lessons from operating Qwen 3.5/3.6 35B across 3 nodes.</description></item><item><title>What tok/s Doesn't Tell You: Measuring LLM Speed That Matters</title><link>https://arknill.github.io/blog/what-tok-s-doesnt-tell-you/</link><pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate><guid>https://arknill.github.io/blog/what-tok-s-doesnt-tell-you/</guid><description>My 204 tok/s GPU feels slower than a 65 tok/s one for some tasks. tok/s alone is a misleading metric — here&amp;#39;s a framework (TTR, Effective tok/s, TCT) that measures what users actually experience.</description></item><item><title>Anthropic's Postmortem Told Half the Truth</title><link>https://arknill.github.io/blog/anthropic-postmortem-half-truth/</link><pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate><guid>https://arknill.github.io/blog/anthropic-postmortem-half-truth/</guid><description>Anthropic admitted 3 harness bugs caused Claude Code degradation. The bugs are real. But the postmortem strategically scoped out model-level regressions, ignored 9+ open issues, and all three &amp;#39;bugs&amp;#39; happen to reduce Anthropic&amp;#39;s serving costs. Here&amp;#39;s what the other half looks like.</description></item><item><title>Opus 4.7 Postmortem: What the Changelog Didn't Say</title><link>https://arknill.github.io/blog/opus-47-postmortem-what-changelog-didnt-say/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://arknill.github.io/blog/opus-47-postmortem-what-changelog-didnt-say/</guid><description>Anthropic admitted three product-layer bugs that degraded Claude Code for 48 days. Cross-checking the postmortem against the CHANGELOG reveals a structural transparency gap — 2 of 3 bugs had zero documentation. And 5 new issues persist beyond the postmortem&amp;#39;s scope.</description></item><item><title>I Tracked 42,363 Claude Code API Calls. Here's Where Your Quota Actually Goes.</title><link>https://arknill.github.io/blog/claude-code-thinking-token-blind-spot/</link><pubDate>Mon, 06 Apr 2026 00:00:00 +0000</pubDate><guid>https://arknill.github.io/blog/claude-code-thinking-token-blind-spot/</guid><description>19 days of transparent proxy data on Claude Code Max 20 — token breakdown, 11 bugs found, Opus 4.7 impact, and Anthropic&amp;#39;s April 23 postmortem. Independent datasets from other researchers confirmed the pattern and corrected my original hypothesis.</description></item></channel></rss>