How METR measures Long Tasks and Experienced Open Source Dev Productivity – Joel Becker, METR
Joel Becker from METR (Model Evaluation and Threat Research) presents the organization's framework for measuring AI agent task horizo...





![[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI](https://frontiermodels.cc/wp-content/uploads/2026/03/state-of-post-training-from-gpt-420x237.jpg)



