When All Context Matters: Extended Cache Augmented Generation – Luis Romero-Sevilla, Orbis

When All Context Matters: Extended Cache Augmented Generation – Luis Romero-Sevilla, Orbis

More

Descriptions:

Luis Romero Sevilla, VP of AI at Orbis, introduces Extended Cache Augmented Generation (XCAG), a retrieval architecture designed for a specific but common hard case: a large document collection where every document is relevant to the query and the collection is replaced frequently. Standard RAG fails because it cannot retrieve all documents without overwhelming the context; GraphRAG fails because recomputing the knowledge graph on every data refresh is prohibitively expensive.

XCAG starts from Cache Augmented Generation (CAG) — loading documents into a large-context model’s KV cache — but extends it by distributing documents across multiple parallel context buckets rather than one. A supervisor model then interrogates each bucket, progressively building its understanding and issuing targeted follow-up questions to specific buckets when it finds relevant content. Because all caches load simultaneously, the architecture is substantially faster than GraphRAG while returning more accurate answers than single-pass RAG for globally-relevant collections.

Romero Sevilla addresses cost concerns directly: KV cache is expensive, but cache lifetime optimization can reduce the bill, and the tradeoff is favorable compared to GraphRAG’s repeated LLM-driven graph construction. The talk positions XCAG within a broader landscape of retrieval strategies — each with its own compute, cost, and speed tradeoffs — and argues that no single approach fits all scenarios. For teams dealing with dense, rapidly-updated document sets, XCAG offers a practical middle path between the extremes of full-graph construction and simple vector similarity search.


📺 Source: AI Engineer · Published June 28, 2026
🏷️ Format: Deep Dive

1 Item

Channels

RAG