Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

Foundation Models5 months ago

Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

Descriptions:

Rhythm Garg and Linden Li, co-founders of Applied Compute and former OpenAI researchers, present a technical deep-dive into building a fast, cost-predictable reinforcement learning stack for enterprise deployment. Unlike lab-scale RL runs that span weeks, Applied Compute targets training jobs that complete in days with low variance on delivery time — a business-critical requirement when working with enterprise customers on contracted timelines.

The core problem they diagnose is GPU idle time in synchronous RL: all samples in a training batch must complete before the next training step begins, so the slowest sample dictates step time. Their measurement is concrete — 40 arithmetic problems, 32 samples each, using Qwen-30B: 99% of samples finished in approximately 40 seconds, but the final 1% required another 80 seconds, a long tail that leaves GPUs idle and wastes substantial compute. Their solution is asynchronous RL, which decouples sampling from training and allows configurable allocation of GPU budget between the two phases.

The talk then walks through the systems modeling required to optimize this allocation — modeling sampling throughput as a function of KV cache batch size, fitting latency curves as a function of inference batch size, and reasoning about how staleness tolerance (training on slightly out-of-date samples) trades off against throughput. Applied Compute’s approach is positioned as enabling enterprises to train use-case-specific models that improve over time via a data flywheel — delivering the kind of specialized reasoning capabilities previously only available to organizations with large-scale lab infrastructure.

📺 Source: AI Engineer · Published December 09, 2025
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

AI Engineer

Tags

Llama OpenAI

Prev

Don’t Build Agents, Build Skills Instead – Barry Zhang & Mahesh Murag, Anthropic

Don’t Build Agents, Build Skills Instead – Barry Zhang & Mahesh Murag, Anthropic

Next

Superintelligence: To Ban or Not to Ban? Max Tegmark & Dean Ball join Liron Shapira on Doom Debates

Superintelligence: To Ban or Not to Ban? Max Tegmark & Dean Ball join Liron Shapira on Doom Debates

18 Related Posts

Related Posts

31:55

Foundation Models

The biggest AI breakthrough in medicine & drug discovery

23 hours ago

01:20:07

Foundation Models

Mind the Gap (In your Agent Observability) — Amy Boyd & Nitya Narasimhan, Microsoft

23 hours ago

25:53

Foundation Models

The Trillion Dollar Agentic Workflow Opportunity Is Here

23 hours ago

18:37

Foundation Models

CI/CD Is Dead, Agents Need Continuous Compute and Computers — Hugo Santos and Madison Faulkner

2 days ago

20:09

Foundation Models

Pinecone Just Demoted Vector Search. Here’s the Knowledge Layer.

2 days ago

14:27

Foundation Models

Claude Makes Dashboards Too Easy. That’s the Problem.

2 days ago