08:18 Benchmarks3 days ago Qwopus 35B + MTP: The Coder That Fixes Its Own Bugs at 160 tok/s Fahd Mirza tests Qwopus Coder, a 35-billion-parameter mixture-of-experts coding model built on the Qwen 3.6 architecture (3B paramete... 0 comments 1.7K views
25:57 Benchmarks4 days ago I benchmarked the NEW Sonnet 5. The results shocked me. How I AI introduces the Howi AI Bench — a repeatable, multi-dimensional evaluation framework built with Claude Code — and runs Claude... 0 comments 2.9K views
30:52 Benchmarks5 days ago Frontier results, on device – RL Nabors, Arize Rachel Lee Nabors — formerly at Mozilla on Firefox DevTools, the W3C, Microsoft Edge, and the React team, now at Arize — presents a p... 0 comments 2K views
13:57 Benchmarks5 days ago Can Krea 2 Turbo Really Make Great Images in 8 Steps? ComfyUI Test Veteran AI runs a structured eight-category evaluation of Krea 2 Turbo — the eight-step distilled image generation model released by... 0 comments 1K views
14:08 Benchmarks7 days ago Qwythos 9B: When You Train a Small Model on Claude Traces: Run Locally Fahd Mirza introduces and benchmarks Qwythos 9B, a reasoning-focused open-source model fine-tuned on over 500 million tokens of Claud... 0 comments 2.9K views
09:36 Benchmarks2 weeks ago Qwen3.6 (REAP 90pct GGUF): The Brain-Damaged Model Fahd Mirza takes a deep look at an aggressively pruned variant of Qwen 3.6 — a 35-billion-parameter mixture-of-experts model — compre... 0 comments 2.8K views
18:17 Benchmarks2 weeks ago VibeThinker 3B – Taking on Giant Models Sam Witteveen digs into VibeThinker 3B, a small language model from Waybo AI Lab — the AI research arm of the Chinese social network... 0 comments 4K views
08:20 Benchmarks2 weeks ago LoopCoder – The 7B Model That Thinks Twice – Does it Beat Others? LoopCoder V2 is a 7-billion-parameter open-source code model built on an unusual architectural idea: instead of stacking more transfo... 0 comments 2.2K views
09:40 Benchmarks3 weeks ago DFlash Just Got Faster: 4x Speed with 160 tok/s Locally Fahd Mirza benchmarks DFlash with SGLang's new SpecV2 overlapping scheduler on an NVIDIA H100 80GB GPU, demonstrating a 4.3x throughp... 0 comments 2K views
31:25 Benchmarks3 weeks ago Claude Fable 5 BANNED: The First Model Agentic Engineers DON’T NEED IndyDevDan covers two intertwined stories in this video: the sudden federal suspension of Claude Fable 5 and Mythos 5, and a detailed... 0 comments 7.6K views
09:03 Benchmarks3 weeks ago I tried to prove AI trading is BS and it backfired The Algovibes channel set out to definitively disprove AI-powered crypto trading — and ended up with results more interesting than ex... 0 comments 2.4K views
09:13 Benchmarks4 weeks ago I Tested 100,000 Trading Strategies on 1,000 Stocks Algovibes presents a large-scale systematic backtesting study covering the full Russell 1000 universe — 1,014 stocks, 66 technical tr... 0 comments 1.1K views
14:35 Benchmarks4 weeks ago Google QAT vs Unsloth Q4_0 – Which Gemma 4 12B Quantization Is Better? Fahd Mirza runs a controlled comparison between two 4-bit quantized versions of Google's Gemma 4 12B model: Google's own QAT (quantiz... 0 comments 3.2K views
12:30 Benchmarks4 weeks ago Ideogram 4: World’s Best Text-to-Image Model? Let’s Test Locally Fahd Mirza installs and tests Ideogram 4 locally, providing a candid assessment of its real-world hardware requirements and architect... 0 comments 769 views
14:55 Benchmarks4 weeks ago Gemma 4 12B on a 16GB Mac Mini Is Surprisingly Capable Bart Slodyczka puts Google's newly released Gemma 4 12B model through its paces on a 16GB M4 Mac Mini — a practical test of what entr... 0 comments 4.9K views
16:08 Benchmarks1 month ago Benchmarking semantic code retrieval on Claude Code — Kuba Rogut, Turbopuffer Kuba Rogut of Turbopuffer presents original benchmark results comparing three code retrieval strategies for Claude Code: the default... 0 comments 1.1K views