Run Google’s newest 12B AI on a phone? Yes, it’s possible!

Coding & Dev Tools2 months ago

Run Google’s newest 12B AI on a phone? Yes, it’s possible!

Descriptions:

The Alphastack channel walks through a custom cross-platform app that runs Google’s Gemma 4 12B multimodal model entirely on-device — no cloud connection required — on both Windows and Android. The architecture splits into three layers: a single HTML file for the UI (rendered via Edge WebView on Windows and Android System WebView on Android), a Python Flask server that manages the inference engine as a subprocess, and llama.cpp build B9512 as the inference backend. The model runs in GGUF format using Unslaught’s dynamic quantization, bringing the full 12B parameter model down to roughly 7GB.

The video covers the complete build pipeline, including how PyInstaller bundles Python, Flask, and the full llama.cpp CUDA build into a self-contained Windows EXE, and how the Android APK compiles llama.cpp directly via the Android NDK as a native library running in a foreground service. A smaller E2B model ships with the app by default; the full 12B can be downloaded through an in-app settings menu. Multimodal vision support works by loading an additional “mm projector” file alongside the base model, converting images into tokens the model can process.

Live inference demos show the app streaming chain-of-thought reasoning separately from the final answer, collapsing the thinking section once the response is complete. The creator notes real-world memory pressure — the 12B model consumed nearly all available GPU VRAM during recording alongside other processes — but emphasizes that on a dedicated machine performance should be significantly better. Both the Windows EXE and Android APK are available for download via links in the video description.

📺 Source: Alphastack · Published June 05, 2026
🏷️ Format: Hands On Build

Tags

Alpha Stack Claude Code Gemma 4 Gemma 4 12B Google llama.cpp Unsloth

Prev

Fed’s Daly Says Forward Guidance Could Be Misleading

Next

⚡️Making DeepSeek v4 outperform Opus 4.7 with Taste — @AhmadAwais , CommandCode.ai

18 Related Posts

Related Posts

14:58

Coding & Dev Tools

The Ultimate Knowledge Base: Bring YouTube Into Your AI Second Brain

1 hour ago

12:23

Coding & Dev Tools

Microsoft Fara1.5 27B: Local Install + Real Browser Automation Demo

1 day ago

23:27

Coding & Dev Tools

I Built a $10,000 Website for $13 (Claude + Higgsfield)

1 day ago

25:27

Coding & Dev Tools

Full Tutorial: From Idea to App with Claude Design and Claude Code in 25 Minutes

1 day ago

09:07

Coding & Dev Tools

Your AI Agent Is Burning Money (Fix It)

1 day ago

09:16

Coding & Dev Tools

DeepSeek V4 Flash Fully Local — 32 tok/s on a Single Chip

3 days ago