Lost in Latent Space

The eight pages that turned learning into a problem

In 1984, a Harvard computer scientist named Leslie Valiant published a paper titled “A Theory of the Learnable” in the Communications of the ACM. The paper was eight pages. Its bibliography fit on half a column. Almost nobody, at the time, thought it would matter.

What it did was small and enormous. It defined, for the first time, what it should mean for a machine to “learn” — not in some intuitive, vibes-based sense that engineers had been gesturing at since Rosenblatt’s perceptron in the 1950s, but in the way a complexity theorist defines a problem as being in P or NP. Valiant proposed: a class of concepts is learnable if there exists an algorithm that, given polynomially many random examples drawn from any fixed distribution, returns with high probability a hypothesis that is approximately correct on future examples from the same distribution.

The acronym was PAC. Probably approximately correct.

For decades, learning had been a craft. People built perceptrons, then decision trees, then nearest-neighbor classifiers. They demonstrated that the systems worked on this dataset or that one. There was no shared language for what it meant for the underlying class of patterns to be learnable in principle, no notion of a sample-complexity lower bound, no separation between hard and easy problems in the way complexity theory had drawn that line for computation a decade earlier.

Valiant’s framework imported all of it. The reaction at first was muted. The paper was elegant and the field moved cautiously. But within a few years, two things happened. First, Vapnik and Chervonenkis’s older work on uniform convergence — sitting in Russian-language journals from the early 1970s — was repackaged in PAC’s vocabulary, producing the central theorem of statistical learning theory: finite VC dimension implies PAC learnability. Second, computer science theorists started proving that important concept classes, like 3-DNF formulas, were not PAC learnable in polynomial time, modulo standard cryptographic assumptions. There were now problems in learning, and they had complexity.

Today, PAC’s framing feels almost invisible — built into the floor of how anyone serious thinks about learning. Sample complexity, generalization gaps, the realizability assumption, the agnostic relaxation, distribution-free vs. distribution-specific bounds: all of it lives downstream of those eight pages.

The paper itself is still readable. Valiant’s prose is plain. He hedges where modern researchers would assert; he asks open questions a contemporary author would answer with a citation. What he didn’t have, in 1984, was the deep-learning era yet to come, where his framework would be stretched, broken, and partially rebuilt in ways nobody could have anticipated. But the move he made — taking the question seriously enough to formalize it, even imperfectly — is what every theoretical paper in this lineage owes him.

What’s next

The original paper is freely available: Valiant, “A Theory of the Learnable”, CACM 1984. For a modern textbook treatment, Shalev-Shwartz and Ben-David, Understanding Machine Learning, gives a clean PAC-first development. For historical color, the Machine Learning journal’s 30-year retrospective (2015) collects perspectives from people who were there.