Prompt to Pipeline: Building with Google’s Gen Media Stack — Paige & Guillaume, Google DeepMind

Prompt to Pipeline: Building with Google’s Gen Media Stack — Paige & Guillaume, Google DeepMind

More

Descriptions:

At the AI Engineer conference, Google DeepMind engineers Paige Bailey and Guillaume present a workshop on building with Google’s generative media stack, covering recent releases and pricing updates across their image, video, and audio model lines.

Key announcements include Veo 3.1 Light, a new video generation model priced at $0.05 per second—significantly cheaper than earlier Veo 3 pricing—positioned as a low-cost prototyping tier before scaling to higher-quality outputs. On the audio side, Lyria 3 is described as the first music generation model available via public API, capable of producing 30-second clips or full three-minute songs with lyrics on demand. Lyria Real Time, a lesser-known companion model, enables live-streaming music generation that responds to prompts continuously in real time, functioning like an AI DJ. Imagen (referred to internally as Nano Banana 2) has been updated with multi-aspect ratio output support and image-grounded search, allowing it to reference real-world web images during generation.

Beyond product updates, Paige offers candid perspective on AI development cycles, arguing that most infrastructure sprints—vector databases, custom agent frameworks, MCP servers—become obsolete as base model capabilities expand. She observes that structured markdown ‘skills’ files are already displacing MCP server patterns in production workflows, and predicts that agent framework abstractions will similarly be absorbed into models over time.


📺 Source: AI Engineer · Published May 23, 2026
🏷️ Format: Keynote Launch

1 Item

Channels

1 Item

Companies