Descriptions:
Fahd Mirza installs and tests Liquid AI’s newly released LFM2-24B mixture-of-experts model locally using vLLM on an Nvidia H100 with 80GB VRAM. The model features an unusual hybrid architecture combining 30 convolutional layers with 10 attention layers — a deliberate departure from standard transformer designs — activating only 2.3 billion of its 24 billion total parameters per token for fast, efficient inference.
Mirza walks through the full setup: installing vLLM, downloading the 47.7GB model via Hugging Face Hub, and serving it through Open WebUI. The model loads onto approximately 75GB of VRAM. On benchmarks, accuracy scales from 31.77% at 350 million parameters to 71.59% at 24 billion — a result the team presents as evidence of architectural consistency. The model was trained on 17 trillion tokens across nine languages with a 32k context window.
Real-world testing reveals a clear split in capability. Language and reasoning tasks pass adequately, but coding performance is poor: a Three.js GTA-style game prompt produced garbled Chinese characters and hallucinated output, and even a simple fireworks animation returned gibberish. Mirza recommends the model for privacy-sensitive, non-coding deployments — RAG pipelines, document summarization, and on-premise customer support — while steering viewers away from coding use cases where models like Qwen and GLM currently dominate.
📺 Source: Fahd Mirza · Published February 26, 2026
🏷️ Format: Review







