Descriptions:
Audrey Hsu, developer advocate at RunPod, demonstrates the company’s new IDE-integrated GPU deployment tooling at AI Engineer, showing how developers can run GPU-accelerated inference directly from a local development environment without building Docker images, pushing to a container registry, or manually provisioning cloud servers. RunPod, which recently crossed $120 million in annual recurring revenue and operates across 30-plus data centers in 10 countries, built the tooling to collapse the slow iteration cycle that defines early-stage model development.
The live demo centers on a Python function performing image generation with Stable Diffusion XL Turbo. Adding a RunPod endpoint decorator — specifying a GPU family (Ada 80 Pro, an H100 variant), maximum worker count, and timeout — is sufficient to route GPU work to the cloud while the rest of the application runs locally. Hot module reload re-packages and pushes changes instantly, allowing developers to test and iterate without rebuilding infrastructure between each attempt. The session includes a crowd-sourced prompt test (“cats flying on a cloudy day in London”) that puts the end-to-end latency on display.
Hsu also outlines the broader RunPod platform: on-demand pods billed by the second, reserved GPU pods, autoscaling serverless workers that scale to zero during idle periods, multi-node training clusters, and a Hub of pre-vetted open-source model repos including ComfyUI, Stable Diffusion, and vLLM. The talk is aimed at developers who need flexible, reliable GPU access and want to minimize time spent on infrastructure configuration relative to model and application work.
📺 Source: AI Engineer · Published June 09, 2026
🏷️ Format: Hands On Build







