We Cut 94% of AI Coding Tokens With a Local Code Index – Rajkumar Sakthivel, Tesco

Coding & Dev Tools6 days ago

We Cut 94% of AI Coding Tokens With a Local Code Index – Rajkumar Sakthivel, Tesco

Descriptions:

Rajkumar Sakthivel of Tesco walks through how he and his colleague Foss discovered that 90% of their AI coding costs came not from model inference but from bloated input context — and built a local code index layer to fix it. Profiling a typical query on their own project revealed 45,000 tokens being sent when only 5,000 were actually relevant, a pattern common across Claude Code, Cursor, GitHub Copilot, and Codex.

Their solution is a five-step local pipeline that sits between a codebase and any AI coding tool. It chunks code by semantic unit (functions, classes, methods rather than arbitrary blocks), runs hybrid retrieval combining vector semantic search and keyword search simultaneously, compresses results to function signatures and docstrings, tracks call-graph dependencies to surface related code, and applies an adaptive score threshold to filter low-confidence results. The hybrid retrieval design is key: semantic search alone misses exact names; keyword search alone misses conceptually related code; together they cut miss rate from ~25% to ~10%.

Benchmarked on FastAPI’s open-source codebase — 53 files, 20 realistic developer questions — the tool cuts context from 83K tokens per query to 4.9K, a 94% reduction, while maintaining 90% recall. With additional output compression, the footprint drops to 523 tokens. The benchmark is public and runnable. Sakthivel is candid about limits: the 94% figure represents a worst-case full-file-read baseline; modern tools are already smarter, and the system degrades significantly on large monolithic files with mixed responsibilities.

📺 Source: AI Engineer · Published June 28, 2026
🏷️ Format: Hands On Build

1 Item

Channels

No Image Available

AI Engineer

Tags

Claude Code Cursor FastAPI GitHub Copilot

Prev

HERMES AGENT + Stripe Payments + NVIDIA Nemotron is INSANE!

Next

Run DeepSeek DSpark on Qwen3 Locally and Reproduce the Speedup

18 Related Posts

Related Posts

09:39

Coding & Dev Tools

DeepSeek DFlash on Gemma 12B Locally: Up To 5x Faster

22 hours ago

15:45

Coding & Dev Tools

Every AI Agent Demo Stops at Email. I Pointed Mine at the Bills That Cost You Money.

22 hours ago

24:28

Coding & Dev Tools

Fable 5 is WILD…

2 days ago

08:08

Coding & Dev Tools

I Embedded Whisper.cpp Into a Real App

2 days ago

21:09

Coding & Dev Tools

I Built a Real AI Jarvis That Controls My Computer

3 days ago

13:29

Coding & Dev Tools

Control What Your AI Agents Can Do: Archestra + Ollama Hands-On

4 days ago