Let’s go Bananas with GenMedia — Guillaume Vernade, Google DeepMind

Let’s go Bananas with GenMedia — Guillaume Vernade, Google DeepMind

More

Descriptions:

Guillaume Vernade, a developer advocate at Google DeepMind, presented a hands-on walkthrough of the GenMedia stack at the AI Engineer conference, covering practical techniques for multi-character image generation and visual consistency using the Gemini API and Google’s Imagen (internally referred to as Nano Banana) models.

The talk blends candid internal context — including how API fragmentation across DeepMind’s image models has frustrated developers, and the multi-year architectural push toward a single unified multimodal model — with live coding demos built around illustrating scenes from “The Wind in the Willows.” Vernade demonstrates two approaches to maintaining character consistency across multiple generated images: first, using chat history so the model retains prior character images as implicit context; then refining to a structured output approach that explicitly passes only the reference images for characters appearing in each specific chapter, improving contextual fidelity at scale.

The session is practical and opinionated throughout. Vernade is frank about what works (generate_content with explicit image references), what degrades at scale (relying purely on conversational history for multi-character scenes), and where he would improve the pipeline further — generating multiple portrait angles per character to give the model stronger reference material. For developers working with Google’s media generation APIs, the talk offers a ground-level perspective from someone whose job is to make these APIs actually usable in production.


📺 Source: AI Engineer · Published May 18, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

1 Item

Companies