Local AI FAQ 2.0

Local AI FAQ 2.0

More

Descriptions:

Digital Spaceport’s Local AI FAQ 2.0 is an extended technical Q&A session working through community questions about building and optimizing local AI hardware setups, with a focus on CPU selection, memory bandwidth, multi-GPU configuration, and software stack behavior.

On the CPU side, the host explains how frequency-optimized AMD EPYC processors like the 7F52 (3.5 GHz base, 3.9 GHz turbo) deliver strong single-thread inference performance, and breaks down how memory bandwidth scales with DIMM slot population. A fully populated second-generation Rome EPYC system can reach around 204 GB/s theoretical bandwidth, while partial configurations with 128GB across four slots land closer to 102–105 GB/s. The host draws on comparisons with his own 7702P build to contextualize the tradeoffs.

For GPU setups, the video covers why VLLM sharding works cleanly only in powers of two — 1, 2, 4, 8 GPUs — due to attention head alignment constraints, and why irregular counts like six GPUs produce warnings or outright failures. The host recommends pairing a high-VRAM lead GPU such as an RTX 4090 with three secondary cards rather than four, and cautions against pursuing trillion-parameter local models without serious cost-benefit analysis. Additional topics include Proxmox 9 GPU passthrough, cooler compatibility across EPYC and Threadripper socket generations (SP3 vs SP6 tension adjustment), and how to ask effective technical support questions when troubleshooting local AI stack issues.


📺 Source: Digital Spaceport · Published December 16, 2025
🏷️ Format: Troubleshooting