Vllm on ArkNill

Vllm on ArkNillhttps://arknill.github.io/tags/vllm/Recent content in Vllm on ArkNillHugoenSat, 25 Apr 2026 00:00:00 +0000Quantization, Determinism, and Thinking Tokens: Running Open-Source LLMs in Productionhttps://arknill.github.io/blog/quantization-determinism-thinking-production/Sat, 25 Apr 2026 00:00:00 +0000https://arknill.github.io/blog/quantization-determinism-thinking-production/FP8 is the production floor. Q4 MoE loses 16% on CJK. vLLM is non-deterministic under MTP. Thinking tokens eat 90% of your budget on the wrong tasks. Hard lessons from operating Qwen 3.5/3.6 35B across 3 nodes.