llama.cpp - Frontier Models

There are 33 items in this page

11:00

Coding & Dev Tools5 days ago

Ornith 1.0 35B in GGUF – Beats Models 10x Its Size – Run Locally

Fahd Mirza puts Ornith 1.0 35B through its paces in this hands-on local deployment walkthrough. Ornith is a mixture-of-experts model...

14:48

Business & Strategy6 days ago

Turbocharge Your Agent’s Retrieval with TurboQuant – Shashi Jagtap, Superagentic AI

Shashi Jagtap, founder of SuperAgentic AI, presents at the AI Engineer conference on TurboQuant — a vector embedding compression algo...

14:08

Benchmarks7 days ago

Qwythos 9B: When You Train a Small Model on Claude Traces: Run Locally

Fahd Mirza introduces and benchmarks Qwythos 9B, a reasoning-focused open-source model fine-tuned on over 500 million tokens of Claud...

08:51

Tutorials1 week ago

OpenJarvis + Ollama: Local AI Agent That Tracks Every Watt

Fahd Mirza walks through the installation and hands-on testing of Open Jarvis, a newly released local-first personal AI framework dev...

14:42

Tutorials2 weeks ago

Qwen3.6 27B (Pi-Reasoning GGUF) – Fine-Tuned for Local Heavy AI Agent

Fahd Mirza tests Pi-Reasoning, a community fine-tune of Qwen 3.6 27B built specifically for agentic coding — tasks like reading files...

08:12

Research & Benchmarks2 weeks ago

GLM 5.2 – Why Everyone is Loving It? And How to Run It Locally

Fahd Mirza covers GLM-5.2, the 744 billion parameter mixture-of-experts model from Chinese AI lab Zhipu AI that has become one of the...

16:47

Research & Benchmarks3 weeks ago

Google QAT vs Unsloth QAT + MTP – Which Gemma 4 12B Is Actually Better?

This video pits two quantized versions of Google's Gemma 4 12B against each other in a practical, locally-run benchmark: Google's own...

13:08

Tutorials4 weeks ago

Gemma 4 12B QAT + MTP on llama.cpp Locally – Twice the Speed, Same Quality?

This video by Fahd Mirza walks through running Google's newly released Gemma 4 12B QAT (Quantization-Aware Training) model alongside...

06:01

Coding & Dev Tools4 weeks ago

Run Google’s newest 12B AI on a phone? Yes, it’s possible!

The Alphastack channel walks through a custom cross-platform app that runs Google's Gemma 4 12B multimodal model entirely on-device —...

09:07

Tutorials1 month ago

DwarfStar: Run DeepSeek V4 Locally with DS4 at 34 tok/s

Fahd Mirza covers DwarfStar, a brand-new inference engine built specifically for DeepSeek V4 Flash (DS4) by the creator of Radius. Un...