Descriptions:
Prince Canuma, a core contributor to Apple’s MLX framework and engineer at Arcee, delivers a conference demo showing how to deploy and manage AI agents — including voice and vision agents — entirely on Apple Silicon devices without any cloud dependency. The talk draws on his personal motivation: building accessible technology for his father, who lost his sight in 2020 and lives in a region with unreliable internet access.
Canuma walks through the MLX ecosystem, which now counts over 1.5 million downloads and more than 4,000 ported models. He demonstrates MLX VLM (the vision-language model runtime that also powers LM Studio) running Google’s Gemma 4 26B locally on a MacBook with 96GB of unified memory, real-time object detection using a Roboflow model via the new MLX Swift bindings, and live background segmentation — all confirmed offline. He notes that even M1 MacBooks can run very large models by leveraging device storage.
The session is aimed at developers who want to reduce cloud subscription costs and build privacy-preserving, low-latency AI applications. Key takeaways include day-zero MLX support for frontier open-source releases like Gemma 4, the viability of omnimodal (vision + audio) pipelines on iPhone and iPad, and a growing library of community projects that extend MLX into voice agents, accessibility tools, and local coding assistants.
📺 Source: AI Engineer · Published May 11, 2026
🏷️ Format: Tutorial Demo







