Ternary Bonsai: The Tiny Model That Should Not Be This Good

Foundation Models4 weeks ago

Ternary Bonsai: The Tiny Model That Should Not Be This Good

Descriptions:

Fahd Mirza covers Prism ML’s Ternary Bonsai, the latest release from the team behind the one-bit Bonsai model, which pushes ultra-low-precision language modeling one step further by adding a third weight state — zero, or “silent” — alongside the binary minus-one and plus-one. This ternary approach requires only 1.58 bits per weight rather than a full 16, enabling dramatically smaller, faster models without proportional quality loss.

The benchmarks are striking: on an M4 Pro, the model runs at 83 tokens per second — over five times faster than a full-precision baseline — and hits 27 tokens per second on an iPhone 17 Pro Max. Energy efficiency is three to four times better than a standard 16-bit model. The 8-billion-parameter version fits in just 1.75 GB and scores 75.5 on average across six benchmarks covering knowledge, reasoning, math, coding, instruction following, and tool use. The only model that outscores it in the comparison is Qwen3 8B, which requires more than 16 GB.

Prism ML introduces an “intelligent density” metric — capability per GB of memory — on which Ternary Bonsai sits in a different tier from all conventional full-precision models tested. The main caveat at time of publication is that local execution is Apple Silicon only. Mirza links to the Hugging Face card and GitHub repo for those who want to run it, and notes a web GPU demo is also available.

📺 Source: Fahd Mirza · Published April 17, 2026
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

Fahd Mirza

Prev

Claude Opus 4.7 Just Dropped… Or Did It Really?

Claude Opus 4.7 Just Dropped… Or Did It Really?

Next

Claude Design: Everything You Can Build in 16 Minutes (5 Real Use Cases)

Claude Design: Everything You Can Build in 16 Minutes (5 Real Use Cases)

18 Related Posts

Related Posts

31:55

Foundation Models

The biggest AI breakthrough in medicine & drug discovery

23 hours ago

01:20:07

Foundation Models

Mind the Gap (In your Agent Observability) — Amy Boyd & Nitya Narasimhan, Microsoft

23 hours ago

25:53

Foundation Models

The Trillion Dollar Agentic Workflow Opportunity Is Here

23 hours ago

20:09

Foundation Models

Pinecone Just Demoted Vector Search. Here’s the Knowledge Layer.

2 days ago

14:27

Foundation Models

Claude Makes Dashboards Too Easy. That’s the Problem.

2 days ago

18:37

Foundation Models

CI/CD Is Dead, Agents Need Continuous Compute and Computers — Hugo Santos and Madison Faulkner

2 days ago