Your Agent Is Wasting Tokens and You Don’t Know It – Erik Hanchett, AWS

Tutorials6 days ago

Your Agent Is Wasting Tokens and You Don’t Know It – Erik Hanchett, AWS

Descriptions:

Erik Hanchett, senior developer advocate at AWS, delivers a focused lightning talk on five techniques for cutting token costs in production AI agent applications, with code examples drawn from AWS’s Strands Agents framework. The talk targets developers who are surprised by how quickly token bills accumulate once agents are in production — particularly due to patterns that silently repeat large payloads on every iteration of the agent loop.

The five strategies are: caching system prompts (and optionally tool prompts and prior messages) so subsequent calls avoid resending the full prompt; routing tasks by difficulty using cheaper models like Claude Haiku for simple requests and Claude Sonnet for more complex ones, with a low-cost model optionally making the routing decision itself; offloading large tool results to local or cloud storage with summarization rather than appending them to context on every loop iteration; capping tool loop iterations with a hard max to prevent runaway infinite loops from consuming unbounded tokens; and trimming conversation history using Strands’ sliding window conversation manager, which retains only the last N messages while allowing earlier history to be summarized and injected as a compact prefix.

Hanchett also recommends running observability tooling before deploying any agent — profiling tool call frequency, execution time, and loop counts — to catch inefficient tools before they hit production load. The techniques are framework-agnostic in principle, though code samples specifically target Strands Agents and AWS’s provider integrations.

📺 Source: AI Engineer · Published June 28, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

No Image Available

AI Engineer

1 Item

Companies

No Image Available

AWS

Tags

AWS

Prev

HERMES AGENT + Stripe Payments + NVIDIA Nemotron is INSANE!

Next

Run DeepSeek DSpark on Qwen3 Locally and Reproduce the Speedup

18 Related Posts

Related Posts

14:35

Tutorials

Fable 5 + Karpathy’s LLM Wiki is Basically Cheating

22 hours ago

10:25

Tutorials

Krea2 Has No Good Reference Mode. LoRA Is the Fix|From Dataset to Turbo Output

22 hours ago

11:53

Tutorials

You’re Not Behind (Yet): Master Hermes In 12 Minutes

22 hours ago

08:18

Tutorials

Claude Code Artifacts Are Here (No Backend!)

22 hours ago

09:02

Tutorials

Needle: Finetune a 26M Tool-Calling Model Locally with Ollama

22 hours ago

19:38

Tutorials

Finally, an Open Standard for the Karpathy LLM Wiki is HERE

2 days ago