Ethernet is DEAD?? Mac Studio is 100x FASTER!!

Ethernet is DEAD?? Mac Studio is 100x FASTER!!

More

Descriptions:

NetworkChuck benchmarks a four-node Apple Mac Studio M4 Ultra cluster—each machine configured with 512GB unified memory, 80 GPU cores, and 8TB storage, totaling 2TB of GPU-accessible memory and 320 GPU cores for approximately $50,000. The build is a direct follow-up to an earlier five-node M2 Max cluster test that ran 91% slower than a single machine due to networking bottlenecks. The critical new variable: Apple’s macOS Tahoe 26.2 beta introducing RDMA (Remote Direct Memory Access) over Thunderbolt 5, paired with a beta ExoLabs build supporting tensor parallelism.

Benchmark results on Llama 3.3 70B FP16 show the progression clearly: pipeline parallelism (old method) at approximately 5 tokens/sec, tensor parallelism without RDMA at 3 tokens/sec, and tensor parallelism with RDMA enabled at 16 tokens/sec with 66ms per token latency. The video also tests DeepSeek and Kimi K2 to stress the cluster further. Setup required enabling RDMA in macOS recovery mode and connecting the four nodes in a Thunderbolt 5 mesh topology.

The cost comparison makes the case for local clustering: equivalent VRAM from Nvidia H100s (26 cards at 80GB each) would exceed $780,000. ExoLabs also shipped a native Mac app in this beta release, replacing the previous CLI-only interface. The practical conclusion is that Apple’s RDMA update fundamentally changes whether Mac Studio clustering makes sense for running large open-weight models locally.


📺 Source: NetworkChuck · Published December 20, 2025
🏷️ Format: Benchmark Test

1 Item

Channels

1 Item

Companies