Descriptions:
Fahd Mirza demonstrates PokeClaw (Pocket Claw), an open-source Android application that uses Google’s Gemma 4 E2B model to autonomously control a smartphone entirely on-device — no internet connection, no API keys, and no external services required. Built by a solo developer over two nights, the app downloads the Gemma 4 E2B model (2.6 GB) directly to the device and leverages Android’s accessibility service permissions to read the screen and simulate taps and inputs.
The architecture runs through a set of generic tools — tap, swipe, type, open app, send message, screenshot, and read screen — with the model receiving a text representation of the current screen state, selecting an action, executing it, and observing the result in a closed loop. To compensate for the reasoning limitations of a 2.3 billion active parameter model, the developer introduced a concept called “skills”: predefined multi-step workflows that chain tools together so the agent follows a reliable recipe for complex tasks like reading and replying to messages, rather than improvising from scratch.
Mirza gives an honest assessment of the security implications: accessibility service permissions grant the app full read access to everything visible on screen, including banking applications and personal messages. He recommends only running the tool on dedicated or trusted devices and notes the project is still in early development — version 2, prone to crashes, with 12 default tools listed but not yet fully inspectable. Despite the caveats, he frames PokeClaw as a meaningful proof-of-concept for what fully local on-device AI agents on Android can look like as edge models continue to improve.
📺 Source: Fahd Mirza · Published April 06, 2026
🏷️ Format: Showcase







