Descriptions:
Kuba Rogut, a deployed engineer at Turbopuffer, pushes back on the “RAG is dead” wave that swept AI social media in late 2025 and early 2026, presenting Google search volume data showing that interest in RAG actually hit a new inflection point midway through 2025. His core argument: the dismissal conflates simple vector search with the full definition of retrieval-augmented generation, which properly encompasses full-text search (BM25), regex, grep, and structured filters — and that hybrid retrieval combining these methods is quietly becoming the default architecture for serious agentic applications.
The talk uses Cursor as its primary case study of production-grade retrieval. Cursor uses Merkle trees to calculate similarity between codebases opened by teammates, copying over existing embeddings and re-chunking only changed files rather than starting from scratch each session. Their internal benchmark shows semantic search drives a 12.5–13.5% improvement in answer accuracy across models, rising to nearly 24% for their Composer model specifically. An online A/B test found 2.6% better code retention in large codebases and a 2.2% reduction in dissatisfied user requests — gains Rogut notes are understated because semantic search is not triggered on every query. Turbopuffer powers Cursor’s semantic search layer.
Rogut also addresses Claude Code’s well-documented decision to use grep instead of vector search, framing the two approaches as “cached compute” versus “per-session discovery” — embeddings pay off when indexing cost amortizes across many queries, but not always. The session closes by defining what agentic search actually means in practice: giving agents a set of retrieval tools — vector, full-text, and file system — to iteratively find and reason over context until a task is complete.
📺 Source: AI Engineer · Published June 09, 2026
🏷️ Format: Deep Dive







