AI Dev 26 x SF: Jean-Marie John-Mathews: Red Teaming LLM Applications Systematically

Tutorials4 weeks ago

AI Dev 26 x SF: Jean-Marie John-Mathews: Red Teaming LLM Applications Systematically

Descriptions:

Jean-Marie John-Mathews, a researcher at JustCatch, presents a systematic approach to red teaming LLM applications at AI Dev 26 San Francisco, opening with a real-world failure that generated widespread attention: a Chipotle chatbot that went viral after users successfully prompted it off-topic — a reputational incident representative of the broader class of risks JustCatch helps enterprises prevent.

The talk identifies why standard LLM-as-judge evaluation frameworks break down for agentic systems: agents can produce correct outputs through wrong reasoning, the most consequential failures often occur inside invisible tool calls, and static golden datasets cannot capture the dynamic multi-turn patterns where real exploitation typically occurs. Two concrete examples drive the point home — a frustrated customer whose agent repeatedly asks for rephrasing instead of escalating to a human, and a CRM update that silently omits a required field in a tool call’s input parameters with no visible error in the conversation log.

JustCatch’s open-source testing framework addresses these gaps by letting developers describe desired agent behavior in plain natural language, then automatically generating versioned, reproducible test cases that integrate into CI/CD pipelines. A live demo using Claude’s coding assistant shows the workflow applied to a RAG documentation agent built on JustCatch’s own docs. The tool is positioned as accessible to teams without dedicated red-teamers, with an enterprise version serving large banks alongside a publicly available open-source library.

📺 Source: DeepLearningAI · Published May 20, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

No Image Available

DeepLearningAI

Tags

Claude Code LinkedIn

Prev

Wizstar AI Video Generator – Full Marketing Video From Just an Amazon Link | Full Walkthrough

Next

This AI Model Has No VAE! Testing HiDream-O1’s Unified Transformer

18 Related Posts

Related Posts

12:13

Tutorials

4 Essential Tips for SCAIL-2: Motion, Expression & Masking|How to Master SCAIL

25 minutes ago

10:01

Tutorials

Hermes Agent Just Got SCARY Good (Using Apify)

25 minutes ago

19:10

Tutorials

From Zero to Claude Code in 19 Minutes (no code)

25 minutes ago

14:50

Tutorials

Omnigent: The New Meta-Harness for EVERY Coding Agent – Claude Code, Codex, Pi, More

25 minutes ago

09:41

Tutorials

This New ‘Fusion’ AI Beats Claude Fable 5 — Here’s How To Use It (OpenRouter Fusion Tutorial)

1 day ago

23:53

Tutorials

6 Things People Get Wrong Setting up An AI OS (+ Fixes)

3 days ago