Descriptions:
Most of use reach for a frontier model by default and pay for it on every call, in latency, in energy, in cash, and in everything that leaves their stack. For most of those calls, a small local model would do the job.
RL Nabors, former Meta/React core team member and AWS alum, covers the vocabulary you need to reason about model performance (capability evals, golden datasets, LLM-as-judge) and walks through real cases: a local agentic harness replacing a frontier call, an in-browser moderation classifier defended with production-trace evals, and a generative summarization feature where the rubric turns out to be harder than the model. You’ll leave with a framework for deciding when to choose large and off-prem or small and local models, and how to measure your way to the answer instead of guessing.
You will learn:
– The vocabulary to reason about model performance (capability evals, golden datasets, LLM-as-judge).
– A framework for deciding when a small or local model can replace a frontier one and when it can’t.
– A repeatable process for building capability evals from your own production traces, not someone else’s benchmark.
– Working examples of using eval results to iterate on prompts and ship with confidence instead of vibes.
Speakers:
– RL Nabors (Arize): RL Nabors builds developer tools and the communities that make them stick. Previously React and MDN, currently developer experience at Arize, perpetually building Mima.
X/Twitter: https://x.com/rachelnabors
LinkedIn: https://linkedin.com/in/nearestnabors
GitHub: https://linkedin.com/in/nearestnabors







