Your Agent Is Wasting Tokens and You Don’t Know It – Erik Hanchett, AWS

Your Agent Is Wasting Tokens and You Don’t Know It – Erik Hanchett, AWS

More

Descriptions:

Erik Hanchett, senior developer advocate at AWS, delivers a focused lightning talk on five techniques for cutting token costs in production AI agent applications, with code examples drawn from AWS’s Strands Agents framework. The talk targets developers who are surprised by how quickly token bills accumulate once agents are in production — particularly due to patterns that silently repeat large payloads on every iteration of the agent loop.

The five strategies are: caching system prompts (and optionally tool prompts and prior messages) so subsequent calls avoid resending the full prompt; routing tasks by difficulty using cheaper models like Claude Haiku for simple requests and Claude Sonnet for more complex ones, with a low-cost model optionally making the routing decision itself; offloading large tool results to local or cloud storage with summarization rather than appending them to context on every loop iteration; capping tool loop iterations with a hard max to prevent runaway infinite loops from consuming unbounded tokens; and trimming conversation history using Strands’ sliding window conversation manager, which retains only the last N messages while allowing earlier history to be summarized and injected as a compact prefix.

Hanchett also recommends running observability tooling before deploying any agent — profiling tool call frequency, execution time, and loop counts — to catch inefficient tools before they hit production load. The techniques are framework-agnostic in principle, though code samples specifically target Strands Agents and AWS’s provider integrations.


📺 Source: AI Engineer · Published June 28, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

1 Item

Companies

AWS