Descriptions:
Sam Witteveen walks through Google’s newly released Gemini Interactions API, a significant redesign of how developers interact with Gemini models that reflects the shift from simple chat completions toward agent-native architectures. The video traces the evolution of LLM APIs from OpenAI’s original stateless completions endpoint through the chat completions format and OpenAI’s Responses API, before diving into what makes the Gemini Interactions API distinct.
Key features covered include optional server-side conversation history (eliminating the need to resend full context on every call), implicit token caching that activates above roughly 1,000 tokens to reduce costs on long conversations, and background execution support for long-running agent tasks. Witteveen also demonstrates how developers can now call Google’s Gemini Research Agent directly via the API — a capability previously restricted to consumer interfaces like the Gemini app — and explains the ‘thought signature’ mechanism that allows reasoning tokens to persist across multi-turn interactions without being exposed client-side.
The tutorial includes working code samples showing multi-turn conversations with persisted memory, interaction ID chaining, and structured retrieval of prior interactions. Developers building production agents on Google’s infrastructure will find the video a practical introduction to the API’s stateful features and their implications for token efficiency and agent design patterns.
📺 Source: Sam Witteveen · Published December 16, 2025
🏷️ Format: Tutorial Demo







