43:11 Agents & Automation2 days ago Local Hermes & Openclaw on Beelink in 43 mins Keith AI delivers a detailed, framework-driven evaluation of running Hermes and OpenClaw locally on a Beelink S10 Max mini PC — the d... 0 comments 1.6K views
08:28 Coding & Dev Tools4 days ago Qwen3-8B at 74 tok/s with RedHat DFlash Speculator on vLLM Locally Fahd Mirza walks through running Red Hat's DFlash speculative decoding implementation on Qwen3-8B using vLLM, achieving 74 tokens per... 0 comments 1.6K views
13:37 Research & Benchmarks1 week ago Zaya1 8B – Intelligence Efficiency by Zyphra – Run Locally Zyphra, a San Francisco AI lab known for earlier releases like Zonos and ZR1, has returned with Zaya 1 (Zia) 8B — an open-source mixt... 0 comments 2.6K views
08:43 Tutorials1 week ago DFlash Drafter for Gemma 4 26B – Official Speculative Decoding is Here: Run Locally ZLab, the UC San Diego research team that invented DFlash speculative decoding, has released the first official drafter model paired... 0 comments 506 views
08:41 Tutorials1 week ago Gemma 4 31B at 196 tok/s with RedHat DFlash Speculator Locally This hands-on tutorial from the Fahd Mirza channel demonstrates running Google's Gemma 4 31B model locally at 196 tokens per second u... 0 comments 2.2K views
32:36 Research & Benchmarks2 weeks ago RTX 5090, Mac Studio, or DGX Spark? I tried all three. Nate B Jones tests the RTX 5090, Apple Mac Studio, and NVIDIA DGX Spark as personal AI computing platforms, but the video is as much... 0 comments 47.1K views
09:01 Coding & Dev Tools2 weeks ago Running a 27B model at 130 tokens sec on a single GPU Locally with Luce DFlash LlamaDeFlash is a custom inference engine built from scratch in C++ and CUDA — no vLLM, no llama.cpp, no Python in the critical path... 0 comments 7.7K views
11:39 Coding & Dev Tools2 weeks ago Poolside Laguna XS.2: New Open Weight Coding Model Tested Locally with vLLM Poolside AI has released two new open-weight coding models: Laguna M.1 (2–5 billion parameters) and Laguna XS.2, a 33-billion-paramet... 0 comments 1.1K views
13:58 Tutorials2 weeks ago NVIDIA’s NEW Open Multimodal Intelligence – Nemotron 3 Nano Omni NVIDIA has released the Nemotron 3 Nano Omni, a unified open multimodal model that fuses three of the company's strongest components... 0 comments 2.7K views
18:10 Coding & Dev Tools2 weeks ago NVIDIA Nemotron 3 Nano Omni — See, Hear & Read Everything Locally NVIDIA's Nemotron 3 Nano Omni is a newly released multimodal model capable of processing video, audio, images, and long-form text sim... 0 comments 1.5K views