Descriptions:
Aditi Gupta from Redis’s Applied AI team presented at AI Dev 26 in San Francisco on the architecture behind a production-grade SRE agent built using the Redis Context Engine, validated with design partners including top-five financial institutions running Redis at scale. The motivation came from enterprise customers managing 60-plus clusters across three to five regions who needed fast, reliable incident triage without depending on stale LLM training data or the noise of generic web search results.
The agent’s foundation is a curated knowledge base built from official Redis documentation spanning all deployment types — open-source, Redis Cloud, and Redis Enterprise. At runtime, a tiered model strategy keeps the system practical at scale: large models handle heavy reasoning and final synthesis, “mini” models run per-topic research in parallel, and “nano” models handle lightweight classification tasks like query routing. A semantic caching layer backed by Redis vector similarity allows repeated queries to return cached answers with source citations, bypassing LLM calls entirely — a strategy Gupta frames using the memorable formulation “call your mom, don’t call your LLM.”
A guardian model reviews every outbound recommendation before it reaches the user, cross-referencing the knowledge base to remove unsafe or fabricated commands. Trustworthiness was the team’s explicit north star constraint: responses must be grounded in verified sources, evidence-driven from live operational data, and fully auditable with citations traceable back to specific documentation chunks. Gupta also explains why neither raw LLM inference nor web search is acceptable for SRE workloads where a wrong configuration recommendation can cause more damage than no recommendation at all.
📺 Source: DeepLearningAI · Published May 20, 2026
🏷️ Format: Workflow Case Study







