lecun1989-repro

Jupyter Notebook ★ 765 updated 2y ago

Reproducing Yann LeCun 1989 paper "Backpropagation Applied to Handwritten Zip Code Recognition", to my knowledge the earliest real-world application of a neural net trained with backpropagation.

What This Repository Does

This is a project that recreates a landmark machine learning experiment from 1989. Back then, Yann LeCun and his team used a neural network to automatically recognize handwritten zip codes on envelopes — one of the first real-world uses of a technique called backpropagation to train a neural network. The code here attempts to rebuild that exact experiment and see if modern computers can reproduce the same results.

How It Works

The project has two main steps. First, it takes MNIST (a standard dataset of handwritten digits that exists today) and shrinks it down to match the size and quantity of the original 1989 dataset. Then it trains a neural network using the same architectural design described in the 1989 paper — including quirky details like units that don't share weights in the way modern networks do, and a specific pattern of how layers connect to each other.

The original training took three days on a 1989-era workstation. On a modern MacBook Air, the same training completes in about 90 seconds — roughly 3,000 times faster. The code reproduces most of the original results pretty closely: the 1989 paper achieved 0.14% error on training data and 5% on test data, while this reproduction gets 0.62% and 4.09% respectively. The small gap likely comes from not having the exact original dataset.

Who Would Use This and Why

This repo is mainly useful for machine learning researchers and students who want to understand the history of neural networks and see how the field has evolved. Someone curious about how the technology worked decades ago, or who wants to learn by studying actual code from a famous research paper, would find this valuable. It's also a nice way to benchmark how much faster computers have become — running the exact same algorithm on modern hardware makes that speedup very concrete.

The project documents all the guesswork involved too, since the original 1989 paper didn't specify every detail (like the exact learning rate or connectivity patterns). This transparency is educational in itself: it shows how researchers fill in gaps when reproducing old work.

Open on GitHub → Full breakdown on explaingit →