Descriptions:
Maxime Labonne, Head of Pre-training at Liquid AI, delivers a technically rigorous conference talk on the distinct challenges of training frontier small language models for on-device deployment. Drawing from hands-on work building the LFM2 model family — which spans 350 million to 24 billion parameters for use cases ranging from smartphones to in-car systems — Labonne identifies three defining constraints: memory bounds, task-specific rather than general-purpose design, and extreme latency sensitivity.
A central insight is the inefficiency of large embedding layers in popular small models. Gemma 3 270M allocates 63% of its total parameters to embeddings, and Qwen 3.5 0.8B allocates 29%, leaving comparatively few parameters for actual reasoning and knowledge capacity. Liquid AI’s LFM2 architecture addresses this through a hybrid design featuring gated short convolutions and grouped query attention (GQA), derived from on-device profiling on real target hardware rather than theoretical optimization. Profiling results show short convolutions deliver substantially better latency than sliding window attention, gated linear attention, and GQA alternatives on CPU-class hardware.
Labonne also covers the full post-training pipeline for small models: supervised fine-tuning with narrow task focus, preference alignment using an on-policy length-normalized DPO algorithm, and reinforcement learning — which he argues is highly effective even at very small scales. A dedicated section addresses doom looping (repetitive token generation), a failure mode especially common in small and reasoning models on complex tasks, with diagnostic guidance on using cold-start SFT data to resolve it. All LFM2 models are available on Hugging Face.
📺 Source: AI Engineer · Published April 29, 2026
🏷️ Format: Deep Dive







