We Cut 94% of AI Coding Tokens With a Local Code Index – Rajkumar Sakthivel, Tesco

We Cut 94% of AI Coding Tokens With a Local Code Index – Rajkumar Sakthivel, Tesco

More

Descriptions:

Rajkumar Sakthivel of Tesco walks through how he and his colleague Foss discovered that 90% of their AI coding costs came not from model inference but from bloated input context — and built a local code index layer to fix it. Profiling a typical query on their own project revealed 45,000 tokens being sent when only 5,000 were actually relevant, a pattern common across Claude Code, Cursor, GitHub Copilot, and Codex.

Their solution is a five-step local pipeline that sits between a codebase and any AI coding tool. It chunks code by semantic unit (functions, classes, methods rather than arbitrary blocks), runs hybrid retrieval combining vector semantic search and keyword search simultaneously, compresses results to function signatures and docstrings, tracks call-graph dependencies to surface related code, and applies an adaptive score threshold to filter low-confidence results. The hybrid retrieval design is key: semantic search alone misses exact names; keyword search alone misses conceptually related code; together they cut miss rate from ~25% to ~10%.

Benchmarked on FastAPI’s open-source codebase — 53 files, 20 realistic developer questions — the tool cuts context from 83K tokens per query to 4.9K, a 94% reduction, while maintaining 90% recall. With additional output compression, the footprint drops to 523 tokens. The benchmark is public and runnable. Sakthivel is candid about limits: the 94% figure represents a worst-case full-file-read baseline; modern tools are already smarter, and the system degrades significantly on large monolithic files with mixed responsibilities.


📺 Source: AI Engineer · Published June 28, 2026
🏷️ Format: Hands On Build

1 Item

Channels