Human-Compatible AI

“Putting values” in machines is risky business.

Eduardo Morciano for Noema Magazine

Nathan Gardels is the editor-in-chief of Noema Magazine.

The point of Noema is to move the needle of frontier thinking by exposing ideas that provoke response and debate on the core issues of the day, not least artificial intelligence.

One such response that is worthy of relaying to Noema readers comes from the famed computer scientist Stuart Russell, one of the godfathers of AI development who heads the Center for Human-Compatible AI at UC Berkeley.

He felt my recent essay on “The Babelian Tower Of AI Alignment” presented an oversimplified, strawman misconception of what “alignment” really means for those who are actually executing it.

I wrote:

As generative AI models become ever more powerful on their way to surpassing human intelligence, there has been much discussion about how they must align with human values so they end up serving our species instead of becoming our new masters. But what are those values?

The problem is that there is no universal agreement on one conception of the good life, nor the values and rights that flow from that incommensurate diversity, which suits all times, all places and all peoples. From the ancient Tower of Babel to the latest large language models, human nature stubbornly resists the rationalization of the many into the one.

Russell’s reply: “No one I know of working on alignment thinks there’s a single universal set of human ‘values’ (in the sense you describe) that we will put into the machine.” In fact, as he sees it, trying to impart “values” and “ethics” to AI would be a dangerous folly.

To explain what he means, Russell pointed me to a key passage in his 2019 book, “Human Compatible: Artificial Intelligence and the Problem of Control,” that directly addressed this issue:

The first and most common misunderstanding is that I am proposing to install in machines a single, idealized value system of my own design that guides the machine’s behavior. ‘Whose values are you going to put in?’ ‘Who gets to decide what the values are?’ Or even [as one critic put it]: ‘What gives Western, well-off, white male cisgender scientists such as Russell the right to determine how the machine encodes and develops human values?’ 

I think this confusion comes partly from an unfortunate conflict between the commonsense meaning of value and the more technical sense in which it is used in economics, AI, and operations research. In ordinary usage, values are what one uses to help resolve moral dilemmas; as a technical term, on the other hand, value is roughly synonymous with utility, which measures the degree of desirability of anything from pizza to paradise.

The meaning I want is the technical one: I just want to make sure the machines give me the right pizza and don’t accidentally destroy the human race. … To avoid this confusion, it is best to talk about human preferences rather than human values, since the former term seems to steer clear of judgmental preconceptions about morality. 

‘Putting in values’ is, of course, exactly the mistake I am saying we should avoid, because getting the values (or preferences) exactly right is so difficult and getting them wrong is potentially catastrophic.

I am proposing instead that machines learn to predict better, for each person, which life that person would prefer, all the while being aware that the predictions are highly uncertain and incomplete. In principle, the machine can learn billions of different predictive preference models, one for each of the billions of people on Earth. This is really not too much to ask for the AI systems of the future, given that present-day Facebook systems are already maintaining more than two billion individual profiles. 

A related misunderstanding is that the goal is to equip machines with ‘ethics’ or ‘moral values’ that will enable them to resolve moral dilemmas. … The whole point of moral dilemmas, however, is that they are dilemmas: there are good arguments on both sides. The survival of the human race is not a moral dilemma. Machines could solve most moral dilemmas the wrong way (whatever that is) and still have no catastrophic impact on humanity.

The Plurality Of Utility

The last line of the Noema essay read: “Where all this ironically leaves us is that aligning AI with ‘universal values’ must, above all, mean the recognition of particularity — plural belief systems, contesting worldviews and incommensurate cultural sensibilities that reflect the diverse disposition of human nature.”

To which Russell replied:

I don’t disagree with this, but it’s far from impossible to handle. For example, if you consider a simple utilitarian AI that maximizes the sum of utilities for the whole of humanity, it necessarily implements exactly the plurality of values you describe.

If it takes an action that affects only Chinese people, its choice will reflect their (supposed) communitarian values, while if it takes an action that affects only Americans, its choice will reflect their (supposed) individualism. If it needs to make a choice that affects American AND Chinese people, who have (supposedly) conflicting preferences, there will need to be tradeoffs — whether the choice is made by an AI system, the UN, the Berggruen Institute, Microsoft, or God.

“By and large,” Russell reflects, “if someone has a deeply held religious belief that requires killing you (for no other good reason), we’d say that your right to life trumps their deeply held religious belief every time. So that puts an upper bound on the weight we might accord to deeply held religious belief, communitarianism, individualism.”

As a foundational technology that will impact all realms of life, the potential and risks of AI are increasingly on everyone’s mind. For that reason, it is critical to get the story right both by airing the anxieties of the lay public and inviting the fertile minds behind intelligent machines to dispel or confirm them.