Turbocharge Your Agent’s Retrieval with TurboQuant – Shashi Jagtap, Superagentic AI

Turbocharge Your Agent’s Retrieval with TurboQuant – Shashi Jagtap, Superagentic AI

More

Descriptions:

Shashi Jagtap, founder of SuperAgentic AI, presents at the AI Engineer conference on TurboQuant — a vector embedding compression algorithm from Google Research published at ICLR 2026 — and demonstrates how it can cut agent retrieval memory costs by five times without measurable quality loss.

The core problem Jagtap addresses is KV cache bloat: as agent context grows, the KV cache can exceed the model’s own memory footprint, and on unified-memory devices like Apple Silicon, embeddings, vector indices, and model weights compete for the same RAM pool. TurboQuant solves this by storing embeddings at 3–4 bits instead of the default 32-bit float precision. It does so through two complementary techniques: PolarQuant, which compresses the vector via scalar quantization after shuffling, and QJL (one-bit error correction), which fixes remaining approximation error. The paper’s key claim — that search quality is preserved because nearest-neighbor retrieval cares only about relative distances, not absolute values — is explained clearly for a software-engineering audience.

Jagtap then introduces TurboAgent, an open-source library SuperAgentic built on top of TurboQuant that lets developers swap in compressed retrieval without changing their existing agent framework or vector database. A live demo shows a Pinecone-backed agent running first with a float-32 baseline index, then with TurboAgent compression, producing identical answers at a fraction of the memory. He notes the technique is being adopted across llama.cpp, MLX, Ollama, and LM Studio.


📺 Source: AI Engineer · Published June 28, 2026
🏷️ Format: Keynote Launch

1 Item

Channels