I Built an AI That Sells Cars From Photos

I Built an AI That Sells Cars From Photos

More

Descriptions:

Alpha Stack walks through the complete build of VinVision, a web application that takes a small set of car photos — plus an optional VIN number — and automatically generates a narrated promotional video suitable for listings on platforms like Craigslist or dealership websites. The demo produces a finished video from four images of a vintage Jeep Cherokee Chief, with AI-generated narration, pacing, and simulated camera movements derived from the still frames.

The technical stack is deliberately minimal. A Google Gemini API key (obtained from aistudio.google.com) handles three distinct tasks: VIN decoding via an external API call to identify make, model, and year; script generation with camera movement annotations; and image-to-video rendering at either a $1 or $3 quality tier. Netlify Functions serve as the serverless backend, offloading the LLM processing that the lightweight frontend can’t handle directly, while ffmpeg assembles the final video output. Total package dependencies are reduced to just two libraries — the Google SDK and the Netlify client — keeping the project unusually clean.

The walkthrough covers the full project structure: environment variable setup, GitHub-to-Netlify deployment pipeline, and the separation between script generation and video rendering as distinct Netlify Function endpoints. A downloadable build plan is referenced in the video description. The project is demonstrated with two vehicles — a 2007 Saab (identified via VIN) and a vintage Jeep — showing that the pipeline produces contextually accurate narration even when the VIN is omitted.


📺 Source: Alpha Stack · Published March 15, 2026
🏷️ Format: Hands On Build

1 Item

Channels