13:26 Tutorials4 weeks ago Under 5 minutes to a deployed LLM endpoint — Audry Hsu, RunPod At the AI Engineer summit, Audrey Hsu, developer advocate at RunPod, delivers a live demo showing how to deploy a production-ready LL... 0 comments 3.5K views
10:33 Coding & Dev Tools4 weeks ago Mellum2: JetBrains’ New Coding Model – vLLM + MCP Tool Use Locally JetBrains has released Mellum 2, a 12-billion-parameter mixture-of-experts coding model that runs at the compute cost of a 2.5-billio... 0 comments 2.1K views
09:08 Tutorials1 month ago Hermes Desktop + Ollama: Run a Self-Improving AI Agent on Your Own Server Fahd Mirza walks through the installation and configuration of Hermes agent desktop, the newly released GUI for the Hermes agent fram... 0 comments 2.1K views
12:40 Foundation Models1 month ago What Lies Beneath the API — Benjamin Cowen, Modal Benjamin Cowen, a forward-deployed machine learning engineer at Modal, delivers a conference talk examining one of the most consequen... 0 comments 346 views
11:06 Tutorials1 month ago Best Qwen3.6 Quant You Can Run Right Now Locally Fahd Mirza examines Nvidia's official FP4 quantization of Qwen3 35B A22B — a release validated by Nvidia's own model optimizer tool a... 0 comments 4.1K views
10:07 Tutorials1 month ago Dolphin X1 Trinity Nano: The Model That Never Says No: Run and Test Locally Fahd Mirza walks through the local deployment and testing of Dolphin X1 Trinity Nano, the first model trained entirely within a custo... 0 comments 493 views
09:39 Coding & Dev Tools1 month ago Step 3.7 Flash – 198B Open Source Model That Does Everything; Does it Really? Step 3.7 Flash is a 198 billion parameter sparse mixture-of-experts model from Step One, activating only 11 billion parameters per to... 0 comments 1.8K views
08:53 Coding & Dev Tools1 month ago LFM2.5-8B-A1B: Local Agentic AI with Multilingual Support Tested LFM2.5-8B-A1B is Liquid AI's latest open-weight model — an 8.3 billion parameter mixture-of-experts architecture that activates only... 0 comments 1.6K views
10:06 Coding & Dev Tools1 month ago DFlash Leaves Qwen Territory – Gemma 4 31B Now Runs 5x Faster with Speculative Decoding Fahd Mirza demonstrates the first end-to-end deployment of Llama Box DFlash with Google's Gemma 4 31B model, following the merge of P... 0 comments 3.4K views
11:11 Tutorials1 month ago MiniCPM5-1B: New 1B King for Local AI – Full Demo Fahd Mirza walks through a complete local installation and live evaluation of MiniCPM 5 in its 1 billion parameter variant, released... 0 comments 2.8K views