Steven Weber is a professor at the school of information and the department of political science at UC Berkeley, and a 2019-20 Berggruen Institute fellow.
is the executive director of the UC Berkeley Center for Long-Term Cybersecurity.
Sekhar Sarukkai is an entrepreneur and a lecturer at the school of information at UC Berkeley. He was recently a fellow at McAfee.
Sundar Sarukkai is a visiting faculty member at the Center for Society and Policy at the Indian Institute of Science in Bangalore.
The harm emerging from waste produced by large and complex human systems has gradually become more apparent over recent human history: plastic and other trash dumped into rivers and oceans, carbon and other pollutants spewed into the atmosphere, pharmaceuticals dumped into sinks and toilets that later show up in fish stocks. The environmental movement of the late 20th century made people tangibly aware of these waste streams and — to a considerable degree — how their own lifestyles and actions were responsible. That was an important step toward policy and behavioral changes aimed at reducing waste and the harms associated with it.
This is not yet true for data and the digital world, where the concept of waste and responsibility for it remains outside most people’s awareness. All of us produce a great deal of digital waste as we go about our daily lives, but (like carbon emissions a hundred years ago) it’s largely invisible. Until, of course, the harms associated with that waste fall directly on the person who created it.
By digital waste, we mean data, whether raw or processed — the intangible aspect of the digital economy waste stream. (Not included here, though still important, are other forms of waste from the digital economy like carbon emissions from data centers or pollution from the manufacturing and poor disposal of electronic devices.) For example, it’s fun and probably helpful to others to post a video showing how you use some tools in your home workshop to adjust the security cameras on the outside of your house. But that same data stream can easily reveal where you live, when you are likely to be home and where the gaps are in your security system — as well as enabling inferences about your neighbors’ homes. This is a form of digital waste, and it’s a widespread consequence of our activities.
The waste stream from our digital lives has been accumulating for decades. Do you really believe, for example, that a quick search on the dark web wouldn’t reveal your social security number, your mother’s maiden name and the town where you were born and went to high school? Do you really believe that the passwords you keep reusing aren’t out there for hackers to find?
Digital waste can have personal consequences: your passwords or banking information leaking out into the dark corners of the internet. It can also accumulate into larger, societal-level harms: disinformation, surveillance, economic inequality and a lack of accountability. Remember that less than 20 years ago, it was a matter of faith that the digital revolution would enable transparency, opportunity, democratization and a rebalance of power away from incumbent institutions. It seems apparent that waste products have now overwhelmed the system and left those promises behind.
There are usually multiple ways to manage the waste that every ecosystem produces: toleration (trace amounts of toxins in drinking water), recycling or transformation (old tires, cardboard), pushing it into some other ecosystem where it is thought to do less damage (New York City’s garbage barges) or encapsulating it in isolated containers separate from any ecosystem at all for thousands of years (nuclear waste).
But there comes a moment in the evolution of almost any socio-technical ecosystem where a significant proportion of people realize that these strategies aren’t enough, and that the waste stream can’t simply be managed — it needs to be reduced. This happened for carbon in the atmosphere over the past half-century or so. “Don’t throw anything away,” read a 2007 Shell ad. “There is no away.” Whether or not it was sincere, it was a powerful message about the limits of ecosystem waste-processing that much of the world has begun to understand.
We’re about to hit such a moment for digital waste. Digital waste management services are overwhelmed. A poignant and direct example, just one of many, is the way in which criminal organizations have built a massive and sophisticated market for stolen data and cybersecurity exploits on the dark web.
The sense of urgency to do something about digital waste is palpable in Washington, Brussels, Palo Alto and just about every other center of political and technological power. News outlets are filled with calls for action, from national privacy legislation and breach-disclosure mandates to awareness-raising. Some are reasoned arguments, some are simply emotional appeals to break up Big Tech.
What’s clear is that there exists no coherent strategy for harm reduction that is larger than whack-a-mole. Platform firms like Facebook and Google are largely in a reactive mode, sometimes appearing to try to move in a responsible direction and sometimes trying to do just enough to take the acute pressure off. Platform firms are certainly making it up as they go along, whether that be in content moderation, data privacy or cybersecurity.
There’s no better example than the multiple de-platformings of Donald Trump in the wake of the January 6 assault on the Capitol. Kicking him off the platforms was hardly a decision made in accordance with a larger, coherent and consistently applied theory of harm reduction; harms continue propagating on the same platforms, and new ones will surely emerge. It was more like a clean-up effort of an oil spill — an important action to take at the moment of crisis, but not something embedded in a larger theory of change.
One window into a potential long-term solution lies in the way we treat the very first step in personal digital security. Right now, the simple logic of digital authentication is that to prove to a bank or a hospital that you are who you say you are, you need to show some subset of three things (factors): something you know (a password), something you have (a one-time code sent to your phone) and something you are (your fingerprint). We all know just how sloppy and risky one-factor password-based authentication has become. Two-factor authentication is safer but more burdensome and still not close to foolproof.
Continuous behavior-based authentication (CBBA) takes the process one step further. CBBA is a set of technologies and processes that authenticates you not once, but continuously, on the basis of something you are.
For example, everyone has slightly different cadences when they type on a keyboard, slightly different eye-motion responses to flashes of light on a screen, slightly different gaits when we walk across the room. Even the sound of a deep breath, if measured precisely enough, can be an authenticating feature. Imagine combining continuous measurement of a number of these uniquely differentiating characteristics of a person into a single probability score that updates regularly. That probability score could be used to determine if a person has the right to withdraw money from the bank, access a medical record or vote in an election.
The beauty of CBBA is that it runs entirely in the background of the user’s experience — you would never get a password prompt, never have to worry about where your phone is, never have to offer up an image of your face or the last four digits of your social security number.
CBBA has a lot of upsides beyond convenience. It would drastically reduce opportunities for fraud, make conventional phishing attacks nearly obsolete and reduce the value of stolen passwords to essentially zero. It would significantly rebalance the cybersecurity landscape by taking away many of the easy routes of attack for bad actors, and it would reduce the costs that legitimate actors today have to bear. Think of a world without password-recovery mechanisms and helplines devoted to that process. There’s a lot to like.
But CBBA also has risks and harms that are big enough to matter. Data about your location, keystroke dynamics, voice patterns or gait — and how the combination of these and other things at any given moment compare to what they were in the past — carry the risk of harm. They become waste as soon as they have been used for authentication. And like DNA records, it’s a long-term waste problem. You can change your password if it’s stolen, but you can’t change your gait or your voice patterns, at least not easily.
The problem with this long-term waste is bigger than just the possibility of impersonation or blackmail, since sophisticated CBBA systems can reduce those risks internally. What’s concerning is what that data can be used to infer about your health, mental state or other characteristics that you don’t intend to expose. This is particularly problematic when CBBA data is combined with other information sloshing around in the waste stream. If an attacker can see how your voice patterns and typing cadence have changed, and they combine that with data about your sleep patterns and what you buy at the grocery store, they’ll almost certainly be able to determine that you are suffering from an episode of severe depression. That can be used to cause great harm — and CBBA data is just one example.
So how do we reduce the waste stream to a manageable level consistent with the goal of maintaining a sustainable ecosystem? Here are four simple moves, using the CBBA example, that would contribute at a conceptual level.
First, stop collecting CBBA data at the moment the system reaches the threshold level of confidence needed for a particular authentication function. Second, delete old data that you cannot prove is enhancing the system’s efficiency and accuracy. Third, in the beginning, limit the number of players who are collecting any one type of CBBA data — a single or small number of sign-on service providers is probably safer. Fourth, develop business models around authentication services that support these changes in practice. These business models need to be sufficiently robust to disincentivize CBBA services from sharing collected data with 3rd parties — to drive advertising, for example.
How do we convert those conceptual harm-reduction moves into actual practice? One potential answer takes cues from cap-and-trade systems for carbon reduction. What’s essential is a pricing mechanism that incentivizes actors in the market to compete on reducing harms associated with data waste.
The cap component is straightforward: The government would establish it as an overall ceiling for harms associated with digital waste, and it would decline by a set amount — say 1% — on an annual basis. This incentivizes innovation in aggregate. The trade component is how the system allocates the burden of harm-reduction to wherever it can be done most efficiently at a given moment. For example, the government would provide credits for algorithms that use the least data and allow the credits to be traded and thus sold to companies whose algorithms require more data. This incentivizes innovation on an individual basis and sets up competitive pressure with rewards for the most efficient ways of reducing harms.
Further means for pricing penalties for emitting digital waste are fairly simple. Consider, for example, a Tobin tax, which was originally designed to disincentivize the rapid trading of currencies. A similar small tax could be placed on every new data input that a provider of CBBA wanted to add to its model. If the new data input were meaningfully beneficial in terms of improving the model’s performance, it would justify paying the tax. But if the data didn’t improve performance sufficiently, the tax would disincentivize collecting the data in the first place. This tax could also increase gradually over time on a predetermined schedule to boost investment and innovation in data harm reduction.
There’s one conceptual roadblock that stands in the way of designing this market for harm reduction: defining a unit of digital harm. With carbon cap and trade, the unit is straightforward: tons of CO2 emitted. It’s relatively easily measured and directly connected to harms. It’s not comprehensive or perfect of course — other greenhouse gas emissions such as methane and sulfur dioxide are also problematic — but it’s good enough.
What is analogous for the harms of data waste? We can say with certainty what it clearly is not: a certain amount of data, like a terabyte. Potential harms vary greatly depending on what is in that terabyte.
It might be possible to gain some insight into the expected negative value of different possible coercive harms associated with data waste by asking people what they would pay to reduce their exposure to those harms. Someone might be willing to pay $10 to avoid being manipulated into buying something, but $30 to avoid being manipulated into a political act. This process has serious drawbacks, probably the most important of which is that many harms come from complex interactions among data that aren’t easily understood by an average person.
A better way depends on an institutionalized process. Imagine a public-private partnership in the form of a Congressionally mandated commission charged with coming up with a proposed experimental unit for data harm, taking into account the best scientific knowledge to date. Key to the structure of the commission is how it is staffed and how it votes. It would have to be incentivized to appoint people acting in good faith and a voting system that tilts toward the interests of the common good.
For example, let’s say a third of the commission is appointed by Congress; a third by NGOs, think tanks and universities; and a third by an industry association representing tech firms. The commission would be tasked with putting forward a recommendation on a unit of data harm by the end of a year’s work, and there would be penalties for all parties if they fail to agree. The penalty sanction would be structured in relation to how much each party benefits from not dealing with data waste. And the unit that the commission agrees upon would be treated as an experiment and used for some defined period of time, while a second commission of new members is appointed to evaluate and refine it.
There has to be a credible threat that if a consensus on an experiment fails to emerge, the corporations benefiting from unmanaged data waste receive the harshest penalties. Is that credible? We think so because, at present, society seems ready to make such a commitment vis-à-vis the platform firms. So from the firms’ perspective, the choice looks like this: Participate constructively in a process to find a reasonable unit of digital harm, a unit that will cost them but improve the outcome for society, or block that process and suffer a much larger cost.
Reducing the harms of digital waste is a modest goal, not a call for revolution. It won’t satisfy the most radical critics of the attention economy, surveillance capitalism and others who have concluded that the exchange of personal data for services with indirect payment through advertising is a fundamentally flawed, deeply menacing or purely undemocratic way to run a business or an economy. A carbon cap-and-trade system doesn’t satisfy the harshest critics of the fossil fuel industry, either. The point is to set the world on a safer, better path.
For those who accept sustainability as a legitimate goal, there is an observable way to know if it’s working, using CBBA again as an example: Imagine that Google, Apple, Facebook, Amazon and other new entrants start providing CBBA single sign-on capabilities. The most important signal will be that the competition between them is significantly about how they reduce negative externalities, waste and the potential for harm — not just about making it convenient and cheap for users. Right now, the incentives barely point toward waste and harm reduction at all. Our experiment would change the balance of incentives, and that’s a modest but important definition of victory.
This argument has several other limitations: It’s a distinctively Western approach that’s based on an economic frame that sees waste through the conceptual lens of negative externalities and efficiency. Considering it harmful that a company, government or attacker is coercing individuals by making inferences out of their data waste may be a culturally distinct notion, not a universal one. Societies and cultures around the world certainly have different mindsets and practices around waste generally. That can be a rich source of additional ideas and opportunities for future work, not a reason to drop the idea.
Market design exercises are also not free of politics. But a public-private commission with voting rules aligned around the public good can provide some insulation, and there may be others to discover.
Another potential problem is inertia: fixed assets and sunk costs. But the digital economy is for the most part much less stuck in fixed assets than the fossil-fuel economy, for example. Data centers are expensive, but not like an offshore oil platform, and they amortize in something closer to 10 years than 50. Some forms of digital waste may have a very long half-life, but there are more degrees of freedom in the world of bits than in the world of molecules.
This might all feel abstract and distant to most individual users of the internet and digital services. But think of it as an important step toward something more evocative and personal. The medium-term goal should be to create a lifestyle around reducing digital waste, much as some places have done for the environment over the last 20 years: recycling, composting, renewable energy, electric transportation, reduced consumption. These are to some extent luxury choices, but the price of each of them is coming down, while awareness is going up.
When you add an element of intergenerational equity to the equation, there is no inherent reason why entrepreneurs won’t seek over time to structure digital harm-reduction similarly. No one knowingly poisons the groundwater their children drink. This new digital lifestyle will ultimately be about protecting the internet for our grandchildren to enjoy, and digital waste reduction is a good place to start.