The thinking lever

The thinking lever

More

Descriptions:

Anthropic product manager Matt Bleifer delivers a detailed technical explanation of how Claude uses test-time compute — also called inference-time compute — to tackle complex problems, and what levers developers can pull to control this behavior. The talk establishes that scaling compute at inference time follows similar patterns to training-time scaling: more time and tokens spent on a problem consistently yields better results across agentic coding, computer use, and PhD-level reasoning tasks.

A traffic simulation demo concretely illustrates the stakes. Running Opus 4.7 on low effort produces a functional but basic result in roughly 50 seconds using around 4,600 output tokens. Cranking to high effort doubles both time and tokens and produces a meaningfully better simulation with an intelligent driver model. Maxing out effort consumes 10x the tokens and time of the low setting, yielding the best graphics, physics, and traffic behavior. Bleifer breaks down three distinct token types — thinking tokens (chain-of-thought scratch pad), tool call tokens, and response text — and traces the evolution from single-block pre-response thinking, to interleaved thinking between tool calls, to the current adaptive thinking paradigm.

Adaptive thinking, the default benchmark setting since Opus 4.6, gives Claude the freedom to think at any point during a response — before or after tool calls, in the middle of text generation — without any fixed constraints on timing or volume. The presentation covers the effort API parameter, budget tokens for explicit thinking limits, and practical guidance on matching effort levels to task complexity.


📺 Source: Claude · Published May 08, 2026
🏷️ Format: Deep Dive

1 Item

Channels

1 Item

Companies