VLLM - Frontier Models

There are 63 items in this page

29:59

Interviews1 month ago

⚡️ Google’s Open AI Strategy — Omar Sanseviero, Google DeepMind

In this Latent Space podcast interview, Omar Sanseviero from Google DeepMind walks through the technical decisions and strategic thin...

10:10

Tutorials1 month ago

Intern-S2-Preview FP8: 35B Scientific Multimodal Model Running Locally

InternLM's latest release, Intern-S2-Preview, is a 35-billion-parameter scientific multimodal model that takes a different approach t...

09:16

Research & Benchmarks2 months ago

Command A+ : Cohere’s “Best Model Ever” Is Kind of Disappointing

Fahd Mirza puts Cohere's newly released Command A Plus through its paces in this hands-on review, offering a skeptical take on a mode...

33:45

Coding & Dev Tools2 months ago

AI Dev 26 x SF | Eda Zhou & Mahdi Ghodsi: Building Personal AI Agents with Open Source Models

At the AI Dev 26 conference in San Francisco, AMD engineers Eda Zhou and Mahdi Ghodsi lead a hands-on workshop teaching attendees how...

15:56

Research & Benchmarks2 months ago

MiniCPM-V 4.6: The Agent Vision Model

Sam Witteveen examines MiniCPM-V 4.6, a 1.3 billion parameter vision-language model released by OpenBMB—a joint initiative between AI...

08:06

Research & Benchmarks2 months ago

MTP vs DFlash — Speculative Decoding Explained Simply

This video by Fahd Mirza offers a clear, structured comparison of two speculative decoding techniques — Multi-Token Prediction (MTP)...

43:11

Agents & Automation2 months ago

Local Hermes & Openclaw on Beelink in 43 mins

Keith AI delivers a detailed, framework-driven evaluation of running Hermes and OpenClaw locally on a Beelink S10 Max mini PC — the d...

08:28

Coding & Dev Tools2 months ago

Qwen3-8B at 74 tok/s with RedHat DFlash Speculator on vLLM Locally

Fahd Mirza walks through running Red Hat's DFlash speculative decoding implementation on Qwen3-8B using vLLM, achieving 74 tokens per...

11:00

Tutorials2 months ago

NVIDIA Nemotron Elastic: 3-in-1 Elastic LLM Like Russian Dolls in One File

NVIDIA's Nemotron Elastic model family packs three reasoning models — 30B, 23B, and 12B parameters — into a single checkpoint file us...

13:37

Research & Benchmarks2 months ago

Zaya1 8B – Intelligence Efficiency by Zyphra – Run Locally

Zyphra, a San Francisco AI lab known for earlier releases like Zonos and ZR1, has returned with Zaya 1 (Zia) 8B — an open-source mixt...