Descriptions:
Fahd Mirza introduces and locally installs OmniLottie, described as the first AI model capable of generating complete Lottie animations directly from text prompts, images, or video input. Lottie is the JSON-based vector animation format used by apps like Duolingo and Airbnb, and the core technical challenge OmniLottie addresses is that raw Lottie JSON is structurally verbose — the team built a custom tokenizer that compresses it into compact command-parameter sequences a language model can efficiently learn from.
The model extends Qwen2.5VL (a pre-trained vision-language model) with a new Lottie-specific vocabulary of temporal, speed, and command tokens, and was trained on mmLottie-2M, a dataset of two million richly annotated Lottie animations sourced from Lottie Files and IconScout. Mirza runs the model locally via a Gradio interface on an Nvidia RTX 6000, using approximately 8–10 GB of VRAM. He generates animations from a text description of a character and from a combined image-plus-text prompt, noting generation times of around 15 minutes for complex outputs.
The video is a useful first look at a niche but technically interesting capability — AI-native vector animation generation — that could matter for developers building design automation tools, motion graphic pipelines, or no-code animation workflows. The architecture walkthrough, covering the tokenizer design and training data pipeline, adds enough depth to make this more than a surface-level demo.
📺 Source: Fahd Mirza · Published March 15, 2026
🏷️ Format: Hands On Build







