Descriptions:
Olive Song, a senior researcher specializing in reinforcement learning and model evaluation at Chinese AI company MiniMax, gives an unusually candid look at how the team builds and trains their frontier open-weight models in this crossover episode from Nathan Labenz’s Cognitive Revolution podcast. MiniMax’s latest model, M2.5, currently tops the Open Router usage leaderboard. The episode combines Olive’s presentation at the AI Engineer conference in New York with an extended interview from Cassia’s Turing Post podcast, Inference.
Technically, the episode covers several specific advances. MiniMax’s interleaved thinking technique allows a model to take an action, receive feedback from its environment, and pause to reason before proceeding — improving performance on long-horizon agentic tasks significantly over standard chain-of-thought. Their perturbation pipeline systematically varies the training environment to force robust generalization rather than pattern memorization. One particularly concrete finding: running reinforcement learning at full FP32 floating-point precision — rather than reduced precision — produces measurably better results by keeping training behavior closer to the theoretical algorithmic ideal, a detail Olive frames as closing the gap between implementation and theory.
The episode also explores how MiniMax’s unusual structure — developing both foundation models and consumer-facing applications in-house — creates tight feedback loops between researchers and developers, enabling faster identification and correction of model weaknesses. Olive discusses the ongoing battle against reward hacking, the tedious debugging process when training runs produce unexpected behavior, and how the team uses internal AI agents to manage the daily flood of research publications. She acknowledges MiniMax’s models do not yet match the top American labs but argues the RL techniques and organizational approach are worth studying regardless.
📺 Source: Cognitive Revolution “How AI Changes Everything” · Published February 22, 2026
🏷️ Format: Interview







