Descriptions:
Fahd Mirza covers Prism ML’s Ternary Bonsai, the latest release from the team behind the one-bit Bonsai model, which pushes ultra-low-precision language modeling one step further by adding a third weight state — zero, or “silent” — alongside the binary minus-one and plus-one. This ternary approach requires only 1.58 bits per weight rather than a full 16, enabling dramatically smaller, faster models without proportional quality loss.
The benchmarks are striking: on an M4 Pro, the model runs at 83 tokens per second — over five times faster than a full-precision baseline — and hits 27 tokens per second on an iPhone 17 Pro Max. Energy efficiency is three to four times better than a standard 16-bit model. The 8-billion-parameter version fits in just 1.75 GB and scores 75.5 on average across six benchmarks covering knowledge, reasoning, math, coding, instruction following, and tool use. The only model that outscores it in the comparison is Qwen3 8B, which requires more than 16 GB.
Prism ML introduces an “intelligent density” metric — capability per GB of memory — on which Ternary Bonsai sits in a different tier from all conventional full-precision models tested. The main caveat at time of publication is that local execution is Apple Silicon only. Mirza links to the Hugging Face card and GitHub repo for those who want to run it, and notes a web GPU demo is also available.
📺 Source: Fahd Mirza · Published April 17, 2026
🏷️ Format: Deep Dive







