Taalas Is Running AI at 17,000 Tokens Per Second — What’s the Catch?

Research & Benchmarks3 months ago

Taalas Is Running AI at 17,000 Tokens Per Second — What’s the Catch?

Descriptions:

Fahd Mirza examines Telas, a chip startup claiming to run AI inference at approximately 17,000 tokens per second — roughly ten times faster than current GPU clusters, at one-twentieth the cost and one-tenth the power draw. The core technology involves etching a model’s weights directly onto custom silicon rather than loading them into memory at runtime, collapsing the memory-compute bottleneck that causes latency in conventional GPU-based deployments. Mirza demonstrates the system live, recording approximately 15,780 tokens per second on Llama 3.1 8 billion.

Rather than treating the demo as a verdict, the video works through four specific concerns. First, Telas is currently running Llama 3.1 8B — a small, older open-source model — not any frontier system, so the speed numbers do not represent state-of-the-art intelligence running fast. Second, the model uses aggressive custom 3-bit and 6-bit quantization, which Telas itself acknowledges introduces quality degradation relative to standard GPU benchmarks. Third, the company’s claim of a two-month turnaround for new models is a forward-looking target, not a proven track record — their first product took two and a half years to build. Fourth, the AI model landscape moves fast enough that today’s capable model can be irrelevant within weeks, raising genuine questions about whether hardwired silicon can keep pace.

Mirza concludes that the underlying architectural insight — unified memory and compute on a single chip — is technically sound and potentially significant, but the current implementation is a first-generation proof of concept whose real-world viability at the frontier remains undemonstrated.

📺 Source: Fahd Mirza · Published February 23, 2026
🏷️ Format: Review

1 Item

Channels

No Image Available

Fahd Mirza

Tags

Groq Meta Nvidia

Prev

Anthropic Tested 16 Models. Instructions Didn’t Stop Them

Anthropic Tested 16 Models. Instructions Didn’t Stop Them

Next

the SCARIEST chart in AI

the SCARIEST chart in AI

18 Related Posts

Related Posts

21:48

Research & Benchmarks

I Tested 3 Ways to Deploy Claude Agents (Here’s When to Use Each)

1 hour ago

42:12

Research & Benchmarks

What AI Agent Should YOU be Using?

1 day ago

10:46

Research & Benchmarks

Ring-2.6-1T: The 1 Trillion Parameter Open Source Model That NO ONE Can Run

1 day ago

05:42

Research & Benchmarks

NVIDIA New AI Is An Efficiency Monster

2 days ago

09:34

Research & Benchmarks

I Tried GPT Image 2.0 for 14 Days So You Don’t Have To

3 days ago

30:30

Research & Benchmarks

Which AI Image Generator Should You Actually Use?

5 days ago