02:01:45 Interviews2 weeks ago Compute Improves Compute + Europe 2031 The Cognitive Revolution's daily AI briefing for June 23, 2026 covers a wide sweep of the AI industry, from semiconductor market turb... 0 comments 148 views
09:07 Tutorials1 month ago DwarfStar: Run DeepSeek V4 Locally with DS4 at 34 tok/s Fahd Mirza covers DwarfStar, a brand-new inference engine built specifically for DeepSeek V4 Flash (DS4) by the creator of Radius. Un... 0 comments 2.6K views
10:06 Coding & Dev Tools1 month ago DFlash Leaves Qwen Territory – Gemma 4 31B Now Runs 5x Faster with Speculative Decoding Fahd Mirza demonstrates the first end-to-end deployment of Llama Box DFlash with Google's Gemma 4 31B model, following the merge of P... 0 comments 3.4K views
08:53 Research & Benchmarks1 month ago $400 Chinese GPU That Wants to Dethrone NVIDIA Fahd Mirza takes a close look at the Lision LX7G 100, a roughly $485 consumer GPU developed entirely in China without CUDA, AMD archi... 0 comments 2.9K views
01:47:27 Interviews1 month ago we are NOT PREPARED for the end of 2026 Wes Roth and co-host Dylan deliver a wide-ranging AI industry podcast covering the most significant developments from the week of Goo... 0 comments 17.4K views
18:25 Coding & Dev Tools1 month ago Your Coding Agent Should Do AI System Engineering — Ben Burtenshaw, Hugging Face Ben Burtenshaw, an engineer at Hugging Face, makes the case that coding agents have crossed a capability threshold where they can now... 0 comments 3.2K views
09:52 Benchmarks2 months ago Luce Megakernel — 25x Faster Than PyTorch on a Single GPU – Test Locally A new open-source project called Luce Megakernel is challenging long-held assumptions about GPU inference efficiency by fusing all 24... 0 comments 3K views
09:01 Coding & Dev Tools2 months ago Running a 27B model at 130 tokens sec on a single GPU Locally with Luce DFlash LlamaDeFlash is a custom inference engine built from scratch in C++ and CUDA — no vLLM, no llama.cpp, no Python in the critical path... 0 comments 7.7K views
10:01 Foundation Models2 months ago The Hidden Engine Behind DeepSeek V4 – DeepEP V2 and TileKernels Explained While most coverage of DeepSeek V4 focuses on benchmark scores, Fahd Mirza goes a level deeper to explain the two open-sourced infras... 0 comments 520 views
10:33 Coding & Dev Tools2 months ago Kimi FlashKDA: 2x Faster AI Prefill — Installed, Explained and Tested Locally Fahd Mirza walks through the live installation of Flash KDA, Moonshot AI's open-source CUDA kernel that accelerates the prefill phase... 0 comments 1.4K views