My M5 Max, Gemma 4, MLX LOCAL Stack. (This KILLS MODEL PROVIDERS)

Benchmarks4 weeks ago

My M5 Max, Gemma 4, MLX LOCAL Stack. (This KILLS MODEL PROVIDERS)

Descriptions:

IndyDevDan runs a structured head-to-head benchmark between a fully specced Apple M5 Max MacBook Pro and its M4 Max predecessor, testing local AI inference across four models: Qwen 3.5 (35B) in GGUF and MLX variants, Nvidia’s NVFP4-format Qwen 3.5, and Google’s Gemma 4 in standard and MLX formats. All inference runs on Apple silicon using the MLX machine learning framework without any cloud API dependency.

The benchmark methodology tracks four metrics: prefill speed (prompt ingestion), decode tokens per second, total wall clock time, and peak RAM usage. Tests run five prompts of increasing complexity, culminating in breadth-first graph traversal tasks at 4K and 32K context windows. Key findings show MLX-optimized model variants consistently outperform their GGUF counterparts on Apple hardware, the M5 Max delivers faster prefill and decode speeds across all tested models, and 32K context represents a practical limit for sub-35B parameter models before accuracy degrades noticeably.

The broader argument is quantitative: local inference on Apple silicon has matured to the point where cloud providers like Anthropic and OpenAI are no longer necessary for many workloads. GPU utilization approaching 100% and RAM usage around 55GB on the M5 Max during heavy reasoning tasks illustrate both the capability ceiling and the impressive progress in consumer-grade local AI.

📺 Source: IndyDevDan · Published April 20, 2026
🏷️ Format: Benchmark Test

1 Item

Channels

No Image Available

IndyDevDan

Tags

Alibaba Anthropic Apple Claude Opus 4.6 Gemma 4 Google MLX NVFP4 Nvidia Ollama OpenAI Qwen 3.5

Prev

Turbovec – Google’s TurboQuant Implementation with Ollama | 8x Compression Proven

Turbovec – Google’s TurboQuant Implementation with Ollama | 8x Compression Proven

Next

Kimi K2.6 + OpenClaw – Two AI Agents Build a Full App Together

18 Related Posts

Related Posts

11:12

Benchmarks

Qwen3.6 27B Gets 20% Faster with MTP and llama.cpp Locally

5 days ago

09:15

Benchmarks

ZAYA1-VL-8B: Efficient Open Visual Intelligence – Run Locally

6 days ago

04:40

Benchmarks

One API Key for Every AI Model (Pay With Crypto)

1 week ago

08:57

Benchmarks

Google Releases Gemma 4 MTP Drafters – Run Locally and DFlash Comparison

1 week ago

08:44

Benchmarks

Are AI Coding Skills Just Hype? I Tested Them

2 weeks ago

11:03

Benchmarks

I Didn’t Expect This: Opus 4.7 vs GPT 5.5

2 weeks ago