Accelerating AI on Edge — Chintan Parikh and Weiyi Wang, Google DeepMind

Business & Strategy1 week ago

Accelerating AI on Edge — Chintan Parikh and Weiyi Wang, Google DeepMind

Descriptions:

Chintan Parikh, product manager for LiteRT at Google AI Edge, and Weiyi Wang from Google DeepMind present at AI Engineer on deploying Gemma 4 models directly on device. The session introduces two new edge-optimized variants: Gemma 4 E2B, requiring roughly 1–2 GB of RAM post-quantization and suited for voice interfaces and low-latency local summarization, and Gemma 4 E4B, targeting laptops and IoT devices that can handle a higher RAM footprint. Both models ship with capabilities previously absent from edge deployments: native function calling for local API interactions, structured JSON output built into the model architecture (rather than achieved via prompt engineering), and a chain-of-thought thinking mode for more complex reasoning tasks.

Parikh outlines the core case for edge deployment across four axes: latency (real-time camera and AR use cases where cloud round-trips are prohibitive), privacy (sensitive document processing that should never leave the device), offline reliability, and cost reduction relative to cloud API token consumption. A demo of the Gallery app shows multi-skill orchestration, prompt-to-audio generation, and dynamic accelerator switching between CPU, GPU, and (upcoming) NPU — all running locally.

The underlying runtime is LiteRT, Google’s on-device inference framework built on TensorFlow Lite, which the team reports is deployed across more than 100,000 apps with billions of active users. The model format is cross-platform, supporting Android, iOS, macOS, Linux, Windows, and web from a single binary, with the sample app open-sourced on GitHub for developers to fork and extend.

📺 Source: AI Engineer · Published May 05, 2026
🏷️ Format: Keynote Launch

1 Item

Channels

No Image Available

AI Engineer

1 Item

Companies

No Image Available

DeepMind

Tags

DeepMind Gemma 4 Gemma 4 E2B Gemma 4 E4B Llama Qualcomm

Prev

AI Agents run my business and life

Next

Claude BROKE Wall Street Overnight…

18 Related Posts

Related Posts

12:23

Business & Strategy

Claude’s 13 Free AI Courses in 12 Minutes

23 hours ago

44:03

Business & Strategy

Cerebras Goes Public in Year’s Biggest IPO | Bloomberg Tech 5/14/2026

23 hours ago

19:11

Business & Strategy

Your Agent Can Now Train Models — Merve Noyan, Hugging Face

2 days ago

41:46

Business & Strategy

I’m terrified of this…

2 days ago

07:44

Business & Strategy

Anthropic Just Dethroned OpenAI. Here’s What Happens Next.

2 days ago

11:15

Business & Strategy

Google’s New Gemini Omni Just Shocked Everyone – Leaked Demo, Pricing, and what comes next

3 days ago