Descriptions:
Nupur Sharma, an engineer at Qodo with a background in DevSecOps, presents hard-won lessons from deploying production agentic code review systems at the AI Engineer conference. Her central finding challenges a common assumption: expanding context windows does not reliably improve agent performance. Through internal benchmarking of Qodo’s multi-agent review pipelines, her team observed a U-shaped attention pattern — LLMs consistently process the beginning and end of a context window but drop middle content, meaning intermediate inputs like Jira tickets or MCP tool results are often silently ignored.
Sharma systematically covers the mitigation toolkit available to practitioners: context engines (ranked retrieval with indexing, which works well for up to hundreds of repositories but degrades at 600–700+ repos), hierarchical summarization to compress history without losing key decisions, and critic nodes — lightweight self-evaluation agents that compare the original goal against the produced output before passing results downstream. She introduces a practical model-routing heuristic: route open-ended discovery, planning, and tool-selection tasks to high-reasoning frontier models (roughly 80% of compute budget), while directing deterministic summarization and formatting tasks to faster, cheaper models (the remaining 20%).
The talk closes with a discussion of mixture-of-agents architectures as the structural answer to context overload: specialized sub-agents each handling a narrow task domain outperform a single large-context agent given multiple responsibilities, which tends to silently drop goals as accumulated context grows.
📺 Source: AI Engineer · Published June 08, 2026
🏷️ Format: Deep Dive







