Descriptions:
Web Dev Cody launches a new series documenting the construction of an AI-powered video generation SaaS from scratch using agentic coding workflows. The stated goal is fully automated educational video production — script, voice-over, and video background generated from a single user prompt with no manual input beyond that initial description.
The first episode focuses on the hardest part of the pipeline first: selecting and integrating the right video generation model. Cody evaluates Fal AI’s offerings in real time, comparing WAN 2.6 at 15 cents per second (1080p) against WAN 2.2’s Fast variant at roughly 2 cents per video, ultimately choosing the budget-friendly option for experimentation. The full pipeline chains OpenAI for script generation, ElevenLabs for text-to-speech, and WAN 2.2 for image-to-video at 9:16 aspect ratio targeting YouTube Shorts and TikTok. He feeds API documentation directly into the AI agent’s planning context and iterates on a structured markdown plan that guides subsequent agentic sessions.
The series is aimed at developers interested in building AI media products and serves as a practical walkthrough of how to combine multiple third-party AI APIs — OpenAI, ElevenLabs, and Fal AI — into a coherent SaaS architecture. Cody’s candid cost analysis and model selection reasoning make this a useful reference for anyone scoping a similar video generation project.
📺 Source: Web Dev Cody · Published February 16, 2026
🏷️ Format: Hands On Build







