“Oceania had always been at war with Eastasia”
– George Orwell, “1984”
For nearly 70 years, perhaps the most fundamental debate in artificial intelligence has been whether AI systems should be built on symbol manipulation — a set of processes common in logic, mathematics and computer science that treat thinking as if it were a kind of algebra — or on allegedly more brain-like systems called “neural networks.”
A third possibility, which I personally have spent much of my career arguing for, aims for middle ground: “hybrid models” that would try to combine the best of both worlds, by integrating the data-driven learning of neural networks with the powerful abstraction capacities of symbol manipulation.
In a recent essay in Noema, Yann LeCun, the chief AI scientist at Meta and one of three “godfathers of deep learning” to have recently won the Turing Award, and Jacob Browning, a “resident philosopher” in LeCun’s lab, wade into this controversy — with a clear yet flawed essay that seemingly offers new alternatives. On careful inspection, though, it is neither new nor compelling.
At the start of the essay, they seem to reject hybrid models, which are generally defined as systems that incorporate both the deep learning of neural networks and symbol manipulation. But by the end — in a departure from what LeCun has said on the subject in the past — they seem to acknowledge in so many words that hybrid systems exist, that they are important, that they are a possible way forward and that we knew this all along. This seeming contradiction, core to the essay, goes unremarked.
About the only sense I can make of this apparent contradiction is that LeCun and Browning somehow believe that a model isn’t hybrid if it learns to manipulate symbols. But the question of learning is a developmental one (how does the system arise?), whereas the question of how a system operates once it has developed (e.g. does it use one mechanism or two?) is a computational one: Any system that leverages both symbols and neural networks is by any reasonable standard a hybrid. (Maybe what they really mean to say is that AI is likely to be a learned hybrid, rather than an innate hybrid. But a learned hybrid is still a hybrid.)
I would argue that either symbol manipulation itself is directly innate, or something else — something we haven’t discovered yet — is innate, and that something else indirectly enables the acquisition of symbol manipulation. All of our efforts should be focused on discovering that possibly indirect basis. The sooner we can figure out what basis allows a system to get to the point where it can learn symbolic abstractions, the sooner we can build systems that properly leverage all the world’s knowledge, hence the closer we might get to AI that is safe, trustworthy and interpretable. (We might also gain insight into human minds, by examining the proof of concept that any such AI would be.)
We can’t really ponder LeCun and Browning’s essay at all, though, without first understanding the peculiar way in which it fits into the intellectual history of debates over AI.
Early AI pioneers like Marvin Minsky and John McCarthy assumed that symbol manipulation was the only reasonable way forward, while neural network pioneer Frank Rosenblatt argued that AI might instead be better built on a structure in which neuron-like “nodes” add up and process numeric inputs, such that statistics could do the heavy lifting.
It’s been known pretty much since the beginning that these two possibilities aren’t mutually exclusive. A “neural network” in the sense used by AI engineers is not literally a network of biological neurons. Rather, it is a simplified digital model that captures some of the flavor (but little of the complexity) of an actual biological brain.
In principle, these abstractions can be wired up in many different ways, some of which might directly implement logic and symbol manipulation. (One of the earliest papers in the field, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” written by Warren S. McCulloch & Walter Pitts in 1943, explicitly recognizes this possibility).
Others, like Frank Rosenblatt in the 1950s and David Rumelhart and Jay McClelland in the 1980s, presented neural networks as an alternative to symbol manipulation; Geoffrey Hinton, too, has generally argued for this position.
The unacknowledged history here is that, back in the early 2010s, LeCun, Hinton and Yoshua Bengio — his fellow deep-learning pioneers, with whom he shared the Turing Award — were so enthusiastic about these neural networks with multiple layers, which had just then finally became practical, that they hoped they might banish symbol manipulation entirely. By 2015, with deep learning still in its carefree, enthusiastic days, LeCun, Bengio and Hinton wrote a manifesto on deep learning in Nature. The article ended with an attack on symbols, arguing that new paradigms [were] needed to replace rule-based manipulation of symbolic expressions by operations on large vectors.”
In fact, Hinton was so confident that symbols were a dead end that he gave a talk at Stanford that the same year, called “Aetherial Symbols” — likening symbols to one of the biggest blunders in scientific history. (Similar arguments had been made in the 1980s as well, by two of his former collaborators, Rumelhart and McClelland, who argued in a famous 1986 book that symbols are not “of the essence of human computation,” sparking the great “past tense debate” of the 1980s and 1990s.)
When I wrote a 2018 essay defending some ongoing role for symbol manipulation, LeCun scorned my entire defense of hybrid AI, dismissing it on Twitter as “mostly wrong.” Around the same time, Hinton likened focusing on symbols to wasting time on gasoline engines when electric engines were obviously the best way forward. Even as recently as November 2020, Hinton told Technology Review, “Deep learning is going to be able to do everything.”
So when LeCun and Browning write, now, without irony, that “everyone working in DL agrees that symbolic manipulation is a necessary feature for creating human-like AI,” they are walking back decades of history. As Stanford AI Professor Christopher Manning put it, “I sense some evolution in @ylecun’s position. … Was that really true a decade ago, or is it even true now?!?”
In the context of what actually transpired throughout the 2010s, and after decades in which many in the machine learning community asserted (without real argument) that “symbols aren’t biologically plausible,” the fact that LeCun is even considering a hypothesis that embraces symbol manipulation, learned or otherwise, represents a monumental concession, if not a complete about-face. The real news here is the walk-back.
Because here’s the thing: on LeCun and Browning’s new view, symbol manipulation is actually vital — exactly as the late Jerry Fodor argued in 1988, and as Steven Pinker and I have been arguing all along.
Historians of artificial intelligence should in fact see the Noema essay as a major turning point, in which one of the three pioneers of deep learning first directly acknowledges the inevitability of hybrid AI. Significantly, two other well-known deep learning leaders also signaled support for hybrids earlier this year. Andrew Ng signaled support for such systems in March. Sepp Hochreiter — co-creator of LSTMs, one of the leading DL architectures for learning sequences — did the same, writing “The most promising approach to a broad AI is a neuro-symbolic AI … a bilateral AI that combines methods from symbolic and sub-symbolic AI” in April. As this was going to press I discovered that Jürgen Schmidhuber’s AI company NNAISENSE revolves around a rich mix of symbols and deep learning. Even Bengio (who explicitly denied the need for symbol manipulation in a December 2019 debate with me) has been busy in recent years trying to get Deep Learning to do “System 2” cognition — a project that looks suspiciously like trying to implement the kinds of reasoning and abstraction that made many of us over the decades desire symbols in the first place.
The rest of LeCun and Browning’s essay can be roughly divided into three parts: mischaracterizations of my position (there are remarkable number of them); an effort to narrow the scope of what might be counted as hybrid models; and an argument for why symbol manipulation might be learned rather than innate.
Some sample mischaracterizations: LeCun and Browning say, “For Marcus, if you don’t have symbolic manipulation at the start, you’ll never have it,” when I in fact explicitly acknowledged in my 2001 book “The Algebraic Mind” that we didn’t know for sure whether symbol manipulation was innate. They say that I expect deep learning “is incapable of further progress” when my actual view is not that there will be no more progress of any sort on any problem whatsoever, but rather that deep learning on its own is the wrong tool for certain jobs: compositionality, reasoning and so forth.
Similarly, they say that “[Marcus] broadly assumes symbolic reasoning is all-or-nothing — since DALL-E doesn’t have symbols and logical rules underlying its operations, it isn’t actually reasoning with symbols,” when I again never said any such thing. DALL-E doesn’t reason with symbols, but that doesn’t mean that any system that incorporates symbolic reasoning has to be all-or-nothing; at least as far back as the 1970s’ expert system MYCIN, there have been purely symbolic systems that do all kinds of quantitative reasoning.
Aside from tendentiously presuming that a model is not a hybrid if it has symbols but those symbols are learned, they also try to equate hybrid models with “models [that contain] a non-differentiable symbolic manipulator,” when symbols in themselves do not inherently preclude some sort of role for differentiation. And they suggest I equate hybrid models with “simply combining the two: inserting a hard-coded symbolic manipulation module on top of a pattern-completion DL module,” when, in fact, everyone actually working in neurosymbolic AI realizes that the job is not that simple.
Rather, as we all realize, the whole game is to discover the right way of building hybrids. People have considered many different ways of combining symbols and neural networks, focusing on techniques such as extracting symbolic rules from neural networks, translating symbolic rules directly into neural networks, constructing intermediate systems that might allow for the transfer of information between neural networks and symbolic systems, and restructuring neural networks themselves. Lots of avenues are being explored.
Finally, we come to the key question: could symbol manipulation be learned rather than built in from the start?
The straightforward answer: of course it could. To my knowledge, nobody has ever denied that symbol manipulation might be learnable. In 2001, in section 6.1 of “The Algebraic Mind,” I considered it, and while I suggested it was unlikely, I hardly said it was impossible. Instead, I concluded rather mildly that, “These experiments [and theoretical considerations reviewed here] surely do not guarantee that the capacities of symbol manipulation are innate, but they are consistent with such a view, and they do pose a challenge for any theory of learning that depends on a great deal of experience.”
I had two main arguments.
The first was a “learnability” argument: throughout the book, I showed that certain kinds of systems — basically 3-layer forerunners to today’s more deeply layered systems — failed to acquire various aspects of symbol manipulation, and therefore there was no guarantee that any system regardless of its constitution would ever be able to learn symbol manipulation. As I put it then:
Something has to be innate. Although “nature” is sometimes crudely pitted against “nurture,” the two are not in genuine conflict. Nature provides a set of mechanisms that allow us to interact with the environment, a set of tools for extracting knowledge from the world, and a set of tools for exploiting that knowledge. Without some innately given learning device, there could be no learning at all.
Leaning on a favorite quotation from the developmental psychologist Elizabeth Spelke, I argued that a system that had some built-in starting point (e.g., objects, sets, places and the apparatus of symbol manipulation) would be more able to efficiently and effectively learn about the world than a purely blank slate. Indeed, LeCun’s own most famous work — on convolutional neural networks — is an example of precisely this: an innate constraint on how a neural network learns, leading to a strong gain in efficiency. Symbol manipulation, well integrated, might lead to even greater gains.
The second argument was that human infants show some evidence of symbol manipulation. In a set of often-cited rule-learning experiments conducted in my lab, infants generalized abstract patterns beyond the specific examples on which they had been trained. Subsequent work in human infant’s capacity for implicit logical reasoning only strengthens that case. The book also pointed to animal studies showing, for example, that bees can generalize the solar azimuth function to lighting conditions they had never seen.
Unfortunately, LeCun and Browning ducked both of these arguments, not touching on either, at all. Weirdly, they instead equated learning symbols with things acquired in later life such as “maps, iconic depictions, rituals and even social roles from the combination of an increasingly long adolescence for learning and the need for more precise, specialized skills, like tool-building and fire maintenance.”) apparently unaware of the considerations from infants, toddlers and nonhuman animals that C. Randy Gallistel and others, myself included, have raised, drawing on a multiple literatures from cognitive science.
In the end, it’s puzzling why LeCun and Browning bother to argue against the innateness of symbol manipulation at all. They don’t give a strong in-principle argument against innateness, and never give any principled reason for thinking that symbol manipulation in particular is learned. Strikingly, LeCun’s latest manifesto actually embraces some specific innate wiring, suggesting at least some degree of tolerance for innateness in some places, viz an “Intrinsic Cost module” that is “hard-wired (immutable, nontrainable) and computes … the instantaneous ‘discomfort’ of the agent” His architecture overall also includes six modules, most of which are tunable, but all of which are built in.
Why include all that much innateness, and then draw the line precisely at symbol manipulation? If a baby ibex can clamber down the side of a mountain shortly after birth, why shouldn’t a fresh-grown neural network be able to incorporate a little symbol manipulation out of the box? LeCun and Browning never really say.
Meanwhile, LeCun and Browning give no specifics as to how particular, well-known problems in language understanding and reasoning might be solved, absent innate machinery for symbol manipulation.
All that they offered instead was a weak induction: since deep learning has overcome problems 1 through N, we should feel confident that it can overcome N+1:
People should be skeptical that DL is at the limit; given the constant, incremental improvement on tasks seen just recently in DALL-E 2, Gato, and PaLM, it seems wise not to mistake hurdles for walls. The inevitable failure of DL has been predicted before, but it didn’t pay to bet against it.
Optimism has its place, but the trouble with this style of argument is twofold. First, inductive arguments on past history are notoriously weak. Start-up valuations during the tech boom of the last several years went up and up, until they didn’t anymore (and appear now to be crashing). As they say in every investing prospectus, “past performance is no guarantee of future results.”
Second, there is also a strong specific reason to think that deep learning in principle faces certain specific challenges, primarily around compositionality, systematicity and language understanding. All revolve around generalization and “distribution shift” (as systems transfer from training to novel situations) and everyone in the field now recognizes that distribution shift is the Achilles’ heel of current neural networks. This was the central argument of “The Algebraic Mind,” with respect to some precursors to today’s deep learning systems; these problems were first emphasized by Fodor and Pylyshyn and by Pinker and Prince in a pair of famous articles in 1988. I reemphasized them in 2012 when deep learning came onto the scene:
Realistically, deep learning is only part of the larger challenge of building intelligent machines. Such techniques lack ways of representing causal relationships (such as between diseases and their symptoms), and are likely to face challenges in acquiring abstract ideas like “sibling” or “identical to.” They have no obvious ways of performing logical inferences, and they are also still a long way from integrating abstract knowledge…
Of course, deep learning has made progress, but on those foundational questions, not so much; on natural language, compositionality and reasoning, which differ from the kinds of pattern recognition on which deep learning excels, these systems remain massively unreliable, exactly as you would expect from systems that rely on statistical correlations, rather than an algebra of abstraction. Minerva, the latest, greatest AI system as of this writing, with billions of “tokens” in its training, still struggles with multiplying 4-digit numbers. (Its scoring of 50% on a challenging high school math exam was trumpeted as major progress, but still hardly constitutes a system that has mastered reasoning and abstraction.) The issue is not simply that deep learning has problems, it is that deep learning has consistent problems.
In my view, the case for the possible innateness of symbol manipulation remains much the same as it ever was:
- Current systems, 20 years after “The Algebraic Mind,” still fail to reliably extract symbolic operations (e.g. multiplication), even in the face of immense data sets and training.
- The example of human infants and toddlers suggests the ability to generalize complex aspects of natural language and reasoning (putatively symbolic) prior to formal education.
- A little built-in symbolism can go a long way toward making learning more efficient; LeCun’s own success with a convolution (a built-in constraint on how neural networks are wired) makes this case nicely. AlphaFold 2’s power, which derives in part from carefully constructed, innate representations for molecular biology, is another. A brand-new paper from DeepMind showing some progress on physical reasoning in a system that builds in some innate knowledge about objects is a third.
Nothing LeCun and Browning had to say changes any of this.
Taking a step back, the world might be roughly divided into three bins:
- Systems (such as virtually all known programming languages) with the apparatus of symbol manipulation fully installed at the factory.
- Systems with an innate learning apparatus that lacks symbol manipulation but is powerful enough to acquire it, given the right data and training environment.
- Systems that are unable to acquire the full machinery of symbol manipulation even when adequate training might be available.
As an important new paper from DeepMind on “Neural Networks and the Chomsky Hierarchy” emphasizes, how a system generalizes is in large part governed by the architectural choices that are built into its design. Current deep learning systems appear (with some caveats discussed in the new paper) to be in category three: no symbol-manipulating machinery at the outset, and no (reliable) symbol-manipulating machinery acquired along the way. When LeCun and Browning acknowledge that scaling — adding more layers and/or more data — is not enough, they seem to agree with my own recent arguments against scaling. All three of us acknowledge the need for new ideas.
We might disagree about what those ideas are. Then again, at the macro level, LeCun’s recent manifesto is in many ways remarkably close to my own manifesto from 2020: we both emphasize the importance of common sense, of reasoning and of having richer world models than are currently possible. We both think that symbol manipulation plays an important (though perhaps different) role. Neither of us thinks that the currently popular technique of reinforcement learning suffices on its own, and neither thinks that pure scaling suffices either.
Our strongest difference seems to be in the amount of innate structure that we think we will be required and of how much importance we assign to leveraging existing knowledge. I would like to leverage as much existing knowledge as possible, whereas he would prefer that his systems reinvent as much as possible from scratch. But whatever new ideas are added in will, by definition, have to be part of the innate (built into the software) foundation for acquiring symbol manipulation that current systems lack.
In the 2010s, symbol manipulation was a dirty word among deep learning proponents; in the 2020s, understanding where it comes from should be our top priority. With even the most ardent partisans of neural nets now recognizing the importance of symbol manipulation for achieving AI, we can finally focus on the real issues at hand, which are precisely the ones the neurosymbolic community has always been focused on: how can you get data-driven learning and abstract, symbolic representations to work together in harmony in a single, more powerful intelligence? It is wonderful that LeCun has at last committed himself to working toward that goal.