EXPOSED: The Dirty Little Secret of AI (On a 1979 PDP-11)

EXPOSED: The Dirty Little Secret of AI (On a 1979 PDP-11)

More

Descriptions:

Dave Plummer of Dave’s Garage trains a real transformer neural network on a genuine 1979 PDP-11 minicomputer — not in Python or PyTorch, but in raw PDP-11 assembly language. The project, called Attention 11 and written by Damian Buret, implements a single-layer, single-head transformer whose only task is learning to reverse an eight-digit sequence. The deliberately minimal scope makes every component of transformer learning visible: forward pass, softmax probability distribution, loss measurement, backpropagation, and weight updates, all running on a 6 MHz single-core CPU with 64K of RAM.

The video doubles as one of the clearest mechanism-level explanations of how transformers actually work. Dave walks through why even a toy reversal task forces the network to discover a structural positional routing rule rather than memorize patterns — exactly the problem self-attention was designed to solve. He uses the classic “bank” disambiguation example to explain how attention dynamically weights context tokens, and walks through learning rate trade-offs with an analogy of training a dog.

The core argument is that modern AI is not magic — it is the same basic loop of guess, measure error, nudge weights, and repeat, now running at industrial scale. Stripping that loop down to 1970s hardware makes the machinery legible in a way that conference slides and slick animations rarely achieve.


📺 Source: Dave’s Garage · Published April 12, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels