Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

Foundation Models2 weeks ago

Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

Descriptions:

Maxime Labonne, Head of Pre-training at Liquid AI, delivers a technically rigorous conference talk on the distinct challenges of training frontier small language models for on-device deployment. Drawing from hands-on work building the LFM2 model family — which spans 350 million to 24 billion parameters for use cases ranging from smartphones to in-car systems — Labonne identifies three defining constraints: memory bounds, task-specific rather than general-purpose design, and extreme latency sensitivity.

A central insight is the inefficiency of large embedding layers in popular small models. Gemma 3 270M allocates 63% of its total parameters to embeddings, and Qwen 3.5 0.8B allocates 29%, leaving comparatively few parameters for actual reasoning and knowledge capacity. Liquid AI’s LFM2 architecture addresses this through a hybrid design featuring gated short convolutions and grouped query attention (GQA), derived from on-device profiling on real target hardware rather than theoretical optimization. Profiling results show short convolutions deliver substantially better latency than sliding window attention, gated linear attention, and GQA alternatives on CPU-class hardware.

Labonne also covers the full post-training pipeline for small models: supervised fine-tuning with narrow task focus, preference alignment using an on-policy length-normalized DPO algorithm, and reinforcement learning — which he argues is highly effective even at very small scales. A dedicated section addresses doom looping (repetitive token generation), a failure mode especially common in small and reasoning models on complex tasks, with diagnostic guidance on using cold-start SFT data to resolve it. All LFM2 models are available on Hugging Face.

📺 Source: AI Engineer · Published April 29, 2026
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

AI Engineer

Tags

Liquid AI

Prev

How Deepseek v4 Connects to the US Grid

Next

Claude Design Masterclass: Websites, Videos & More (2 Hours)

18 Related Posts

Related Posts

31:55

Foundation Models

The biggest AI breakthrough in medicine & drug discovery

23 hours ago

01:20:07

Foundation Models

Mind the Gap (In your Agent Observability) — Amy Boyd & Nitya Narasimhan, Microsoft

23 hours ago

25:53

Foundation Models

The Trillion Dollar Agentic Workflow Opportunity Is Here

23 hours ago

20:09

Foundation Models

Pinecone Just Demoted Vector Search. Here’s the Knowledge Layer.

2 days ago

14:27

Foundation Models

Claude Makes Dashboards Too Easy. That’s the Problem.

2 days ago

18:37

Foundation Models

CI/CD Is Dead, Agents Need Continuous Compute and Computers — Hugo Santos and Madison Faulkner

2 days ago