Jacob Browning is a postdoc in NYU’s Computer Science Department working on the philosophy of AI.
Yann LeCun is a Turing Award-winning machine learning researcher, an NYU professor and the chief AI scientist at Meta.
With artificial intelligence now powering Microsoft’s Bing and Google’s Bard search engines, brilliant and clever conversational AI is at our fingertips. But there have been many uncanny moments — including casually delivered disturbing comments like calling a reporter ugly, declaring love for strangers or rattling off plans for taking over the world.
To make sense of these bizarre moments, it’s helpful to start by thinking about the phenomenon of saying the wrong thing. Humans are usually very good at avoiding spoken mistakes, gaffes and faux pas. Chatbots, by contrast, screw up a lot. Understanding why humans excel at this clarifies when and why we trust each other — and why current chatbots can’t be trusted.
Getting It Wrong
For GPT-3, there is only one way to say the wrong thing: By making a statistically unlikely response to whatever the last few words were. Its understanding of context, situation and appropriateness concerns only what can be derived from the user’s prompt. For ChatGPT, this is modified slightly in a novel and interesting way. In addition to saying something statistically likely, the model’s responses are also reinforced by human evaluators: The system outputs a response, and human evaluators either reinforce it as a good one or not (a grueling, traumatizing process for the evaluators). The upshot is a system that is not just saying something plausible, but also (ideally) something a human would judge to be appropriate — if not the right thing, at least not offensive.
But this approach makes visible a central challenge facing any speaker — mechanical or otherwise. In human conversation, there are countless ways to say the wrong thing: We can say something inappropriate, dishonest, confusing, irrelevant, offensive or just plain stupid. We can even say the right thing but be faulted for saying it with the wrong tone or emphasis. Our whole lives are spent navigating innumerable conversational landmines in our dealings with other people. Not saying the wrong thing isn’t just an important part of a conversation; it is often more important than the conversation itself. Sometimes, keeping our mouths shut may be the only right course of action.
Given how few ways there are to say the right thing, and how many different ways there are to say something wrong, it is shocking that humans don’t make more mistakes than they do. How do we navigate this perilous landscape of not saying the wrong thing, and why aren’t chatbots navigating it as effectively?
How Conversations Should Work
While human conversations can be about anything, our lives are mostly scripted: ordering at a restaurant, making small talk, apologizing for running late and so on. These aren’t literal scripts — there is definite improvisation — so they are more general patterns or loose rules that stipulate how certain kinds of interaction should go. This puts improvisation within narrow bounds: No matter what you decide to order at the restaurant, there’s a right way to do it, and it can be shocking if someone doesn’t get the script.
Scripts are not primarily governed by words. The same script can work even if you don’t speak the language, as tourists worldwide prove by gesturing and pointing. Social norms govern these scripts — shared social institutions, practices and expectations that help us navigate life. These norms specify how everyone should behave in certain scenarios, assigning roles to everyone and giving broad guidance for how to act. An impassive and bored clerk conforms to the same script as the irate person yammering at them, as do the frustrated people in line.
This works because humans are natural conformists. Norm-following is useful: It simplifies our interactions by standardizing and streamlining them, making us all much more predictable to ourselves and each other.
We’ve come up with conventions and norms to govern almost every aspect of our social lives, from what fork to use to how long you should wait before honking at a light. This is essential for surviving in a world of billions, where most people we encounter are complete strangers with beliefs we may disagree with. Putting these shared norms in place makes conversation not just possible but fruitful, laying out what we should talk about — and all the things we shouldn’t.
The Other Side Of Norms
But humans are not just conforming to norms; they are bound by them. Norms are distinct from mere conventions because humans are inclined to sanction those who violate a norm — sometimes overtly, other times just avoiding them. Social norms make it easy to evaluate strangers and determine whether they are trustworthy or not — on a first date, people scan how another person acts, what words they use, which questions they ask. If the other person violates any norms — if they act boorish or inappropriate, for example — we often judge them and deny them a second date.
For humans, these judgments aren’t just a matter of dispassionate evaluations. They are further grounded in our emotional responses to the world. Part of our education as children is a thorough emotional training, ensuring we feel the proper emotions at the right times in conversations: anger when someone violates norms of decency, disgust when someone says something offensive and shame when we’re caught in a lie. Our moral conscience allows us to respond rapidly in conversations to anything inappropriate, as well as predict how others will react to our remarks — when a norm violation will land as a brilliant joke or a career-ending blunder.
The same emotions, though, also push us to enormous lengths to punish violators. If someone has done something egregiously wrong, we often feel compelled to gossip about them. Part of this is simple resentment: If someone does wrong, we might feel like they deserve public condemnation.
But it is more than that. Someone who violates even a simple norm has their whole character called into question. If they’d lie about one thing, what wouldn’t they lie about? Making it public is meant to cause shame and, in the process, force the other person to apologize for (or at least defend) their actions. It also strengthens the norm — the MeToo movement convinced victims of sexual violence to speak up because people would finally take their claims seriously.
In short, people are expected to follow the norms closely, or else. There are high stakes to speaking because we can be held accountable for anything we say, so even self-interested jerks will tend to stay in line to avoid a shaming. So we choose our words carefully, and we expect the same of those we surround ourselves with.
The high stakes of human conversation sheds light on what makes chatbots so unnerving. By merely predicting how a conversation will go, chatbots end up loosely conforming to our norms, but they are not bound by them. When we engage them in casual conversation or test their ability to solve linguistic puzzles, they usually come up with plausible-sounding answers and behave in a normal, human-like way. Someone might even be fooled into thinking they are a person.
But if we change the prompt slightly and or adopt a different script, they will suddenly spew conspiracy theories, go on racist tirades or bullshit us. These things are not statistically implausible; there are plenty of conspiracy nuts and trolls out there and chatbots are trained on what they’ve written on Reddit and elsewhere too.
Any of us could say the same words as these trolls. But we shouldn’t say them because they are nonsense, offensive, cruel and dishonest, and most of us don’t say them because we don’t believe them and we might be run out of town if we did. The norms of decency have pushed offensive behavior to the margins of society (or, at least, what used to be the margins), so most of us wouldn’t dare say such things.
Chatbots, by contrast, don’t recognize there are things they shouldn’t say regardless of how statistically likely they are. They don’t recognize social norms that define the territory between what a person should and shouldn’t say — they’re oblivious to the underlying social pressures that shape how we use language. Even when a chatbot acknowledges a screw-up and apologizes, it doesn’t understand why; it might even apologize for getting an answer right if we tell it it’s wrong.
This illuminates the deeper issue: We expect human speakers to be committed to what they say and we hold them accountable for it. We don’t need to examine their brain or know any psychology to do this — we just trust them if they have a history of being reliable, following norms and acting respectfully.
The problem with chatbots isn’t that they are black boxes or that the technology is unfamiliar. It’s that they have a long history of being unreliable and offensive, yet they make no effort to improve on it — or even realize there is a problem.
Programmers, of course, are aware of these problems. They (and the companies hoping their AI technologies will be widely used) are concerned about the reputations of their chatbots and expend enormous amounts of time retooling their systems to avoid difficult conversations or iron out improper responses. While this helps make them safer, programmers will struggle to stay ahead of the people trying to break the system. The programmer’s approach is reactive and will always be behind the curve: There are just too many ways of being wrong to predict them all.
Smart, But Not Human
This shouldn’t lead us to smug self-righteousness about how smart humans are and how dumb chatbots are. On the contrary, their capacity to talk about anything reveals an impressive — if superficial — knowledge of human social life and the world. They are plenty smart — or, at least, capable of doing well on tests or referencing useful information. The panic these tools have raised among educators is evidence enough of their impressive book learning.
The problem is that they don’t care. They don’t have any intrinsic goals they want to accomplish through conversation and aren’t motivated by what others think or how they are reacting. They don’t feel bad about lying and they gain nothing by being honest. They are shameless in a way even the worst people aren’t — even Donald Trump cares enough about his reputation to at least claim he’s truthful.
This makes their conversations pointless. For humans, conversations are a means to getting things we want — to form a connection, get help on a project, pass the time or learn about something. Conversations require we take some interest in the people we talk to — and ideally, to care about them.
Even if we don’t care about them, we at least care about what they think of us. We’re deeply cognizant that our success in life — our ability to have loving relationships, do good work and play in the local shuffleboard league — depends on having a good reputation. If our social standing drops, we can lose everything. Conversations shape who people think we are. And many of us use internal monologues to shape who we think we are.
But chatbots don’t have a story to tell about themselves or a reputation to defend. They don’t feel the pull of acting responsibly like the rest of us. They can and are useful in many highly scripted situations with lots of leeway, from playing Dungeon Master, writing plausible copy or helping an author explore ideas. But they lack a grasp of themselves or other people needed to be trustworthy social agents — the kind of person we expect we’re talking to most of the time. Without some grasp of the norms governing honesty and decency and some concern about their reputation, there are limits to how useful these systems can be — and real dangers to relying on them.
The upshot is that chatbots aren’t conversing in a human way, and they’ll never get there solely by saying statistically likely things. Without a genuine understanding of the social world, these systems are just idle chatterboxes — no matter how witty or eloquent.
This is helpful for framing why these systems are such interesting tools — but also why we shouldn’t anthropomorphize them. Humans aren’t just dispassionate thinkers or speakers; we’re intrinsically normative creatures, emotionally bound to one another by shared, enforced expectations. Human thought and speech result from our sociality, not vice versa.
Mere talk, divorced from broader engagement in the world, has little in common with humans. Chatbots aren’t using language like we are — even when they say exactly the same things we do. Ultimately, we’re talking past each other. They don’t get why we talk the way we do, and it shows.