Notion’s Sarah Sachs & Simon Last on Custom Agents, Evals, and the Future of Work

Interviews4 weeks ago

Notion’s Sarah Sachs & Simon Last on Custom Agents, Evals, and the Future of Work

Descriptions:

Sarah Sachs and Simon Last of Notion join us for a deep dive into how Notion built Custom Agents, why it took years and multiple rebuilds to get right, and what it means to turn a productivity tool into an agent-native system of record for enterprise work. We go inside the product, engineering, evals, pricing, and org design decisions behind one of the most ambitious AI product efforts in software today—from early failed tool-calling experiments in 2022 to agent harnesses, progressive tool disclosure, meeting notes as data capture, and the long-term vision for software factories and agentic work.

We discuss:
• Why early agent attempts failed: no tool-calling standard, short context windows, unreliable models, and too much complexity exposed to the model
• The “Agent Lab” thesis for application companies: not just wrapping a model, but understanding how people collaborate and building the right product system around frontier capabilities
• How Notion thinks about roadmap timing: not swimming upstream against model limitations, but also building early enough that the product is ready when the models are
• Why coding agents feel like the kernel of AGI, and how Notion is thinking about “software factories” made up of agents that spec, code, test, debug, review, and maintain codebases together
• How Sarah runs AI engineering at Notion: objective-setting over idea ownership, low-ego teams comfortable deleting their own work, and a culture designed to swarm around fast-changing opportunities
• How Notion organizes AI: core AI capabilities and infrastructure, product packaging teams, and a broader company mandate that every product surface must increasingly work for both humans and agents
• Notion’s eval philosophy: regression tests, launch-quality evals, and “frontier/headroom” evals that intentionally only pass ~30% of the time so the company can see where model capabilities are going
• What a “Model Behavior Engineer” is, and why Notion treats eval writing, failure analysis, and model understanding as a distinct function rather than just software engineering
• How agents compose inside Notion: shared databases as primitives, agents invoking other agents, “manager agents” supervising dozens of specialized agents, and memory implemented simply as pages and databases
• Notion’s take on MCP vs CLI: why Simon is bullish on CLI’s self-debugging nature, where MCP still makes sense, and how Sarah thinks about capability, determinism, permissioning, and pricing alignment
• The evolution of Notion’s internal agent harness: from early JavaScript coding agents, to custom XML, to Markdown and SQL-like abstractions, to tool definitions, progressive disclosure, and a much shorter system prompt
• How Notion prices Custom Agents: credits as an abstraction over tokens, model type, serving tier, web search, and future sandbox costs; why usage-based pricing was necessary; and how “auto” tries to match the right model to the right task
• Why Notion is not eager to train a foundation model, where they do fine-tune and optimize today, and why retrieval/ranking is one of the most important investment areas as more searches come from agents rather than humans

—

Sarah Sachs
LinkedIn: https://www.linkedin.com/in/sarahmsachs
X: https://x.com/sarahmsachs

Simon Last
LinkedIn: https://www.linkedin.com/in/simon-last-41404140
X: https://x.com/simonlast

Timestamps
00:00:00 Introduction and launching Notion Custom Agents
00:01:17 Why Notion rebuilt agents four or five times
00:03:35 Building for where models are going, not just where they are
00:05:32 The Agent Lab thesis, wrappers, and product intuition
00:08:07 User journeys, leadership, and low-ego AI teams
00:13:16 The Simon Vortex, hackathons, and bringing security in early
00:16:39 Team structure, demos over memos, and building for agents
00:20:25 Evals, Notion’s Last Exam, and the Model Behavior Engineer role
00:27:37 Evals as an agent harness and the changing role of software engineers
00:30:42 The software factory: specs, verification, and agent workflows
00:32:18 Live demo: a custom agent for coworking space applications
00:35:08 Composing agents, manager agents, and memory as pages
00:38:15 Notion Mail, Gmail, native integrations, and tools
00:39:43 MCP vs CLI and the cost of capability
00:44:13 When Notion uses MCP vs building its own integrations
00:47:43 The history of Notion’s agent harness rebuilds
00:55:35 Power users, public tools, and the setup agent
00:58:01 Self-fixing agents, permissions, and “flippy”
01:01:13 Pricing, credits, and choosing the right model automatically
01:09:01 Why Notion isn’t training its own frontier model
01:14:07 Retrieval, ranking, and search built for agents
01:17:27 Meeting Notes as data capture and workflow automation
01:21:18 Wearables, hardware, and Notion as the system of record
01:23:45 Outro

Tags

Anthropic Azure GitHub GPT-4 Linear OpenAI Slack

Prev

OpenAI’s NEW AGI Warning, Explained

OpenAI’s NEW AGI Warning, Explained

Next

MemPalace with Ollama – Free Local AI Memory That Never Forgets

MemPalace with Ollama – Free Local AI Memory That Never Forgets

18 Related Posts

Related Posts

08:44

Interviews

AI Chipmaker Cerebras Raises $5.55 Billion in Year’s Biggest IPO

23 hours ago

01:06:38

Interviews

Inside Abridge: The AI Listening to 100 Million Doctor Visits — Abridge’s Janie Lee & Chai Asawa

23 hours ago

16:39

Interviews

How Emergent is making app building more accessible with Claude

2 days ago

01:16:02

Interviews

TypeScript, C# and Turbo Pascal with Anders Hejlsberg

2 days ago

23:34

Interviews

The Founders Who Left Tesla to Rebuild America | a16z

2 days ago

46:56

Interviews

“There Is No Task Agents Cannot Do” – Magnus Müller

2 days ago