VLLM - Frontier Models

There are 63 items in this page

23:13

Research & Benchmarks3 months ago

Gemma-4 31B vs Qwen3.5 27B: Hands-on Local Comparison of Two Top Dense Models

Fahd Mirza pits two of the strongest open-weight dense models against each other in a live local benchmark: Google DeepMind's Gemma 4...

0 comments

16:57

Tutorials3 months ago

Gemma-4 26B A4B + vLLM: Best MoE Model of 2026: Running Locally

Fahd Mirza puts Google's Gemma-4 26B A4B through its paces locally, starting with a clear explanation of what the model name actually...

0 comments

13:21

Tutorials3 months ago

Gemma 4 E2B + Hermes Agent + vLLM: Multimodal AI Stack Locally for Free

Fahd Mirza demonstrates a full local stack for running Google's Gemma 4 E2B instruction-tuned model through the Hermes agentic framew...

0 comments

08:33

Coding & Dev Tools3 months ago

Qwen3 Speculator Eagle: Red Hat Made Qwen3-8B 6x Faster: Full Hands-on Guide

Red Hat has quietly entered the AI inference space with a significant technical contribution: a speculative decoding model that makes...

0 comments

09:20

$Run Dots.mOCR Locally — OCR, LaTeX, SVG From Any Image$

Tutorials4 months ago

Run Dots.mOCR Locally — OCR, LaTeX, SVG From Any Image

dots.m OCR is a 1.7-billion-parameter vision-language model from Red Note — the Chinese lifestyle platform also known as Little Red B...

0 comments