Descriptions:
LFM2.5-8B-A1B is Liquid AI’s latest open-weight model — an 8.3 billion parameter mixture-of-experts architecture that activates only 1.5 billion parameters per token. Trained on 38 trillion tokens (three times its predecessor), the model features a 128K context window and a hybrid 24-layer design combining short convolutional blocks with grouped query attention. Liquid AI doubled the vocabulary size from the previous generation to strengthen multilingual support across Arabic, Chinese, Japanese, and Korean.
In this hands-on walkthrough, Fahd Mirza installs and serves the model locally on Ubuntu with an NVIDIA RTX 6000 (48GB VRAM) using vLLM, then connects it to the Hermes agentic framework. The setup process surfaces a real gotcha: the model requires vLLM to be launched with the LFM2 tool call parser (using tool-call-start and tool-call-end tokens), and skipping this step causes agentic tasks to silently fail. After the fix, the model is asked to scan a Python project and generate a structured technical report — a task it partially completes but ultimately struggles with, reading files without autonomously writing the report and instead asking for clarification.
The multilingual evaluation shows more promising results, with accurate translations across the target languages. For developers interested in running a local open-weight model for agentic workflows, this video delivers a complete vLLM setup walkthrough alongside an honest assessment: LFM2.5-8B-A1B is a step forward for Liquid AI, but it still trails models like Qwen and Gemma when it comes to reliable multi-step agentic execution.
📺 Source: Fahd Mirza · Published May 28, 2026
🏷️ Format: Hands On Build







