Running Local AI on AMD

Running Local AI on AMD

More

Descriptions:

Sam Witteveen takes a hands-on look at running local AI on an AMD workstation equipped with a Ryzen Threadripper 9980X processor and the Radeon AI Pro R9 700 GPU with 32GB of VRAM. The video addresses a question gaining urgency in 2026: as frontier model costs climb—especially for agentic and reasoning workloads that burn tokens at a rate chat never did—can prosumer AMD hardware serve as a credible alternative to cloud APIs for serious AI work?

The walkthrough covers the full local stack from the ground up. LM Studio and Ollama both now ship with ROCm runtime support and run out of the box on AMD cards, while 32GB of VRAM means Witteveen can load recommended 4-bit or 8-bit quantizations of Qwen 3, Gemma, DeepSeek, and similar open-weight models without significant compromise. He then moves into the developer layer: PyTorch offers official ROCm wheels installable via a single pip command, the Hugging Face Transformers library runs without code changes, and Unsloth now publishes its own guide for fine-tuning LLMs on AMD GPUs—meaning full training, not just inference, is supported.

The core argument is that ROCm—long the weak link in AMD’s AI story—has matured to the point where standard PyTorch workflows mostly just work. For developers weighing privacy, token costs, and the demands of long-running coding agents like Open Claude or Hermes, this video offers a practical, reproducible assessment of what AMD’s current hardware stack can realistically deliver heading into the second half of 2026.


📺 Source: Sam Witteveen · Published May 26, 2026
🏷️ Format: Hands On Build

1 Item

Channels

1 Item

Companies