llama.cpp - Frontier Models

There are 22 items in this page

19:11

Business & Strategy2 days ago

Your Agent Can Now Train Models — Merve Noyan, Hugging Face

Merve Noyan from the Hugging Face open-source team delivers a broad survey of the current open-model landscape alongside several firs...

22:54

Tutorials4 days ago

This 100% uncensored AI model is insane… let’s run it

David Ondrej walks through the rationale, setup, and practical use of uncensored large language models running locally in 2026. The v...

11:12

Benchmarks5 days ago

Qwen3.6 27B Gets 20% Faster with MTP and llama.cpp Locally

Fahd Mirza demonstrates how to enable multi-token prediction (MTP) on Qwen3.6 27B using ik_llama.cpp — a community fork of the popula...

09:01

Coding & Dev Tools2 weeks ago

Running a 27B model at 130 tokens sec on a single GPU Locally with Luce DFlash

LlamaDeFlash is a custom inference engine built from scratch in C++ and CUDA — no vLLM, no llama.cpp, no Python in the critical path...

14:53

Coding & Dev Tools3 weeks ago

This Mutant AI Model Should Not Exist: Qwopus-GLM-18B-Merged Locally

Fahd Mirza walks through the creation and live testing of Qwopus-GLM-18B-Merged, a community-built model that stitches together two s...

09:08

Tutorials3 weeks ago

Open WebUI Desktop App – Install on Linux, Windows & Mac

Open WebUI has shipped its first native desktop application for Windows, macOS, and Linux, and Fahd Mirza walks through the complete...

15:26

Gemma, DeepMind’s Family of Open Models — Omar Sanseviero, Google DeepMind

Business & Strategy4 weeks ago

Gemma, DeepMind’s Family of Open Models — Omar Sanseviero, Google DeepMind

Omar Sanseviero, a researcher at Google DeepMind, delivers the first public conference talk on Gemma 4 just one week after its releas...

15:26

Gemma, DeepMind’s Family of Open Models — Omar Sanseviero, Google DeepMind

Business & Strategy4 weeks ago

Gemma, DeepMind’s Family of Open Models — Omar Sanseviero, Google DeepMind

Omar Sanseviero, a researcher at Google DeepMind, delivers the first public conference talk on Gemma 4 just one week after its releas...

14:56

MiniMax M2.7 Running Locally on CPU + GPU – Everyone Can Do It

Coding & Dev Tools1 month ago

MiniMax M2.7 Running Locally on CPU + GPU – Everyone Can Do It

Fahd Mirza walks through the complete process of running MiniMax M2.7 — a newly open-sourced 229-billion-parameter mixture-of-experts...

11:54

Run GLM-5.1 Locally on CPU + GPU Easily: Step-by-Step Tutorial

Tutorials1 month ago

Run GLM-5.1 Locally on CPU + GPU Easily: Step-by-Step Tutorial

Fahd Mirza demonstrates how to run GLM-5.1 — the newly open-sourced flagship agentic model from Zhipu AI's GLM team — locally on a si...