AI Kernel Generation: What’s working, what’s not, what’s next – Natalie Serrino, Gimlet Labs

Coding & Dev Tools5 months ago

AI Kernel Generation: What’s working, what’s not, what’s next – Natalie Serrino, Gimlet Labs

Descriptions:

Natalie Serrino, co-founder of Gimlet Labs, presents one of the most technically specific talks at the AI Engineer conference: using AI agents to automatically generate and optimize GPU kernels for machine learning workloads across heterogeneous hardware platforms.

Gimlet Labs builds an agentic inference cloud that orchestrates AI workloads across different hardware vendors and chip sizes. The core problem: most ML kernels are heavily optimized for specific architectures, and the proliferation of frameworks — CUDA, Triton, Metal, and vendor-specific DSLs — combined with a severe shortage of kernel engineering experts creates a bottleneck the company is trying to close with AI. Their system takes a PyTorch workload and a target hardware specification, then runs an autonomous loop of compile, execute, validate, and profile — directly mirroring the human expert workflow. A live demo shows the agent targeting an H100 and finding an optimization that achieves 22% throughput improvement over the torch compile baseline.

The talk covers concrete results across hardware targets and problem complexity levels. A kernel fusion technique — combining convolution, softmax, bias scaling, and sigmoid into a single fused operation — achieved a 40% speedup on an Apple M4. A separate optimization achieved an 80% improvement by recognizing that average pooling could be reexpressed as a convolution, exploiting Metal’s faster convolution path. Across moderate-complexity problems, the system averages roughly 25% speedup. Serrino is candid about limitations: performance degrades significantly on high-complexity problems, and the talk closes with an honest discussion of where current agents succeed and where the research frontier lies.

📺 Source: AI Engineer · Published December 17, 2025
🏷️ Format: Hands On Build

1 Item

Channels

No Image Available

AI Engineer

Tags

Nvidia

Prev

Shipmas Day 12: AI Music Video Generator App

Shipmas Day 12: AI Music Video Generator App

Next

AI Consulting in Practice – NLW, Superintelligent, @AIDailyBrief⁩

AI Consulting in Practice – NLW, Superintelligent, @AIDailyBrief⁩

18 Related Posts

Related Posts

10:06

Coding & Dev Tools

Toto 2.0: Datadog’s Observability AI Model – Full Install + Live Dashboard

1 hour ago

18:19

Coding & Dev Tools

My Hands-Free AI Streaming Setup (CodeRabbit + Claude Code)

1 hour ago

23:22

Coding & Dev Tools

Claude Just Replaced My Financial Advisor (Tutorial)

1 hour ago

06:45

Coding & Dev Tools

How to Make Your AI Agent Crash Proof in 1 Install (Free)

1 hour ago

01:04:27

Coding & Dev Tools

Make your own event-sourced agent harness using stream processors — Jonas Templestein, Iterate

1 day ago

15:13

Coding & Dev Tools

Make the PERFECT Videos with Claude Code (Full Workflow)

1 day ago