Ethics at scale

Scale changes the problems of privacy, security, and honesty in fundamental ways.

By Mike Loukides

November 29, 2017

Lady Justice (source: Dun.can on Flickr)

For the past decade or more, the biggest story in the technology world has been scale: building systems that are larger, that can handle more customers, and can deliver more results. How do we analyze the habits of tens, then thousands, then millions or even billions of users? How do we give hordes of readers individualized ads that make them want to click?

As technologists, we’ve never been good at talking about ethics, and we’ve rarely talked about the consequences of our systems as they’ve grown. Since the start of the 21st century, we’ve acquired the ability to gather and (more important) store data at global scale; we’ve developed the computational power and machine learning techniques to analyze that data; and we’ve spawned adversaries of all creeds who are successfully abusing the systems we have created. This “perfect storm” makes a conversation about ethics both necessary and unavoidable. While the ethical problems we face are superficially the same as always (privacy, security, honesty), scale changes these problems in fundamental ways. We need to understand how these problems change. Just as we’ve learned how to build systems that scale, we need to learn how to think about ethical issues at scale.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Let’s start with the well-known and well-reported story of the pregnant teenager who was outed to her parents by Target’s targeted marketing. Her data trail showed that she was buying products consistent with being pregnant, so Target sent her coupon circulars advertising the baby products she would eventually need. Her parents wondered why their daughter was suddenly receiving coupons for disposable diapers and stretch-mark cream, and drew some conclusions.

Many of us find that chilling. Why? Nothing happened that couldn’t have happened at any small town pharmacy. Any neighborhood pharmacist could notice that a girl had added some weight, and was looking at a different selection of products. The pharmacist could then draw some conclusions, and possibly make a call to her parents. The decision to call would depend on community values: in some cultures and communities, informing the parents would be the pharmacist’s responsibility, while others would value the girl’s privacy. But that’s not the question that’s important here, and it’s not why we find Target’s action disturbing.

The Target case is chilling because it isn’t a case about a single girl and a single pregnancy. It’s about privacy at scale. It’s a case about everyone who shops at any store larger than a neighborhood grocery. The analysis that led Target to send coupons for diapers is the same analysis they do to send coupons to you and me. Most of the time, another piece of junk mail goes into the trash, but that’s not always the case. If a non-smoker buys a pack of cigarettes, do their insurance rates go up? If an alcoholic buys a six-pack, who finds out? What can be gathered from our purchase histories and conveyed to others, and what are the consequences? And who is making decisions about how to use this data?

When nobody can presume that their actions are private, we’re in a different ethical world. The actions of a human pharmacist aren’t comparable to the behavior of systems that put everyone’s privacy at risk. Our obsession with scale amplifies problems that might be innocent enough if they could be addressed individually. An individual’s need for privacy may depend on context and personal choice; scale ignores both context and choice. Scale creates a different set of ethical problems—and it’s a set of problems we haven’t thought through.

There are several aspects of the Target case (and cases like it) that deserve thought. First, who is responsible? It is difficult, if not impossible, to talk about ethics without agents who are accountable for their decisions. A local pharmacist can make a decision, and can bear responsibility for that decision. In Target’s case, though, the circular was sent by a piece of software that had no idea that it was engaging in problematic behavior. It was doing what it was supposed to do: analyzing buying patterns and sending coupons. The word “idea” itself is revealing: software doesn’t have “ideas,” but we instinctively feel the need to assign agency to something, some actor that makes an informed decision. Is the programmer who built the system accountable for how it is used? Is the data scientist who created the model? It’s unlikely that either the programmer or the data scientist have any idea what the system is actually doing in the real world, and certainly they have no control over how it is deployed. Is the “management” that ordered the system and specified its behavior responsible? That sounds more concrete, but scratch the surface and you’ll find a murky collective; one might as well say “the stockholders.”

Second, exposing a pregnant teenage girl to her parents was clearly an “unforeseen consequence”: nobody designed the system to do that. Programmers, analysts, managers, and even stockholders certainly need to think more about the consequences of their work; all too often, unforeseen consequences could have been foreseen. However, I can’t be too hard on people for not imagining all possible consequences. The possible consequences of any action easily spin out to infinity, and expecting humans to anticipate them invites paralysis.

Third, collecting and using personal data isn’t entirely negative: collecting medical data from millions of patients can lead to new treatments, or to earlier ways of detecting serious diseases. What if the teenager’s buying patterns indicated that she was self-medicating for a serious medical condition, such as preeclampsia? Does that merit an automated intervention? There are good ways to use data at scale, and they can’t be cleanly separated from the bad ways.

I don’t want to presuppose any answer to these questions; ethics is ultimately about discussion and dialog, rather than any one person’s opinion. I do want to suggest, though, that scale changes the issues. We need to start thinking about ethics at scale.

Here’s another example: any decent thief can pick the lock on your house. We know how to think about that. But an attack against internet-enabled locks could potentially unlock all the locks, anywhere in the world, simultaneously. Is that somehow a different issue, and if so, how do we think about it? (While I was writing this, news came out of the first attack against Amazon’s Key service.)

Building an ethical argument around the legal system is dubious, but that may give us a way in. I doubt that you could sue the lock manufacturer if someone picked the lock on your front door. That falls into the “shit happens” category. You could possibly sue if the lock was faulty, or if it had a particularly shoddy design. But almost any lock can be picked by someone with the right tools and skills. However, I can easily imagine a class action lawsuit against a lock manufacturer whose locks were “picked” en masse because of a software vulnerability. Anthem Blue Cross has agreed to pay millions of dollars to settle lawsuits over a data breach; people are lining up to sue Equifax. Would a lock manufacturer be any different?

As in the Target case, we see that agency is obscure; we don’t know who’s responsible. We’re almost certainly dealing with unforeseen consequences, and on many levels. An attack against a smart lock could take advantage of vulnerabilities in the lock itself, the vendor’s data center, the homeowner’s cell phone, the locking app, or even the cell phone provider. The failure could be the consequence of an incredibly subtle bug, a forgotten security update, or a default password. Whatever the cause, the failure is amplified to internet scale. While it’s not clear who would be bear responsibility if the world’s smart locks were hacked, it is clear that we need to start thinking about safety at a global scale, and that thought process is qualitatively different from thinking about individual locks.

A final example: “fake news” isn’t a new thing. We’ve all known cranks who believe crazy stuff and waste your time telling you how they were abducted by aliens. We’ve smiled at the grocery store tabloids. What’s scary now is fake news at scale: it’s not one crank wasting one person’s time, but one crank whose idea gets propagated to literally billions. Except that this news doesn’t come from a crank, but from a professional agent of a hostile government. And the “people” passing the news along aren’t people; they’re sock puppet identities, bots created purely for the purpose of propagating misinformation. And the scale at which this takes place transcends even the most powerful press.

As danah boyd said at her Strata NY keynote, this is no longer a simple social media issue; it’s a security issue. What happens when you poison the data streams that feed the “artificial intelligences” that tell us what to read? I have an ethical commitment to free speech; but when free speech becomes a computer security issue, it’s a different game. I can defend someone’s right to propagate absurd news stories without approving their conduct. But how do we think about intentionally propagating deceptive speech at scale? I won’t defend someone’s right to log into my computer systems and modify data without my permission; should I defend someone’s right to poison the data streams that determine which stories Google and Facebook send to their readers? What are the responsibilities of those who build and maintain those data streams? The ethics of scale around “fake news” certainly needs to account for the platforms (such as Facebook) that are, as Renee DiResta has said, “tailor-made for a small group of vocal people to amplify their voices.”

Whether the issue is privacy, safety, honesty, or any other issue, the ability of our systems to amplify problems to internet scale changes the problem itself. I could have come up with many examples. Banks routinely deny loans, and it’s certainly unethical to loan money to someone who won’t be able to repay, or to refuse loans for reasons unrelated to the applicant’s ability to pay, but what happens when loan applications are denied at scale? Are entire classes of people treated unfairly? Are loans routinely denied to people who come from certain neighborhoods, work at certain occupations, or have certain medical conditions? Informers used to identify opponents of a political regime one at a time; now, face recognition can potentially identify every attendee at a protest or a rally. These problems are superficially the same as they were decades ago—but when scaled, they change completely.

The ethics of scale differs at least in part because of the “fellow travelers” that we’ve seen: the problems of hidden agency and unforeseen consequences. The tools that we use to achieve scale, by nature, hide agency. Is a judge responsible for sentencing a prisoner, or is that responsibility given to an algorithm that hides control? Do responsible humans create advertising campaigns, or do we delegate those tasks to software? If an algorithm rejects a credit application, who ensures that the decision was fair and unbiased? We can’t address the ethics of scale without talking about the people—not the algorithms—responsible for decisions, and we are right to be wary of systems for which no one seems accountable.

The problem of unforeseen consequences is perhaps the greatest irony of the connected internet age. The internet itself is nothing but an unforeseen consequence. Back in the 1970s, it was an interesting DARPA-funded experiment. None of the internet’s inventors could have foreseen its future, and they would probably have designed it differently if they had. Back in the early 1990s, when the public internet was young, it was supposed to bring about world peace by facilitating communication and understanding; and just over a decade later, we proudly proclaimed that social media enabled the Arab Spring. Those of us who shared that naivete also share responsibility: a less naive culture might, in due time, have created a Facebook or a Twitter that wasn’t so vulnerable to “fake news.” Indeed, everything from the Morris worm and the first email spam to the Equifax attack is an unforeseen consequence.

It’s not possible to foresee all consequences, let alone eliminate them, and obsessing over those consequences may well paralyze us and prevent us from doing good. The novelty of any invention makes it even more difficult to predict how the consequences will play out; who would have thought that Mitt Romney’s remark about “binders full of women” would have started an internet meme? But thinking about ethics and participating in an ethical discussion about software at scale requires us to foresee some of those consequences, and think about their effects before all of us become the victims. And as time goes on, we need to become less naive. Once we’ve seen how the reaction to a chance remark can propagate like wildfire through social media, and even into Amazon product reviews, we should be aware of how our systems can be manipulated and gamed.

Immanuel Kant’s “categorical imperative” may help us to think about ethics at scale. “Act according to the principle which you would want to be a universal law” says that we should think carefully about the kind of world we are creating. Are we building systems optimized to maximize profit for a small group of stakeholders, or are we building a system that will be better for all of humanity? What are the consequences of our actions and creations to individuals, but also multiplied to all the inhabitants of our world? We need to look at bigger pictures, and we certainly should be more skeptical about our abilities than we were in the early days of the internet. The ACM’s Code of Ethics and Professional Conduct is a good starting point for a discussion, as are organizations such as Data & Society and NYU’s AINow. Many colleges and universities now offer classes on data ethics. But many people in the software industry have yet to join the discussion.

We need to think boldly about the concrete, everyday problems we face, many of which are problems of our own making. We’re not talking about a hypothetical AI that might decide to turn the world into paper clips. We’re talking about systems that are already working among us, defining the world in which we live—and wasting our time on arcane hypothetical issues will only prevent us from solving the real problems. As Tim O’Reilly says, our Skynet moment has already happened: we live in a web of entangled partially intelligent systems, designed to maximize objective functions that were designed with no thought for our well-being.

It’s time to put those systems under our control. It’s time for the businesses, from Google and Facebook (and Target) to the newest startups, to realize that their future isn’t tied up in short-term profits, but in building a better world for their users. Their business opportunity doesn’t lie in building echo chambers or in placing personalized ads, but in helping to create a world that’s more fair and just, even at scale. To build better systems and businesses, we need to become better people. And to be better people, we must learn to think about ethics: not just personal ethics, but ethics at scale.

Post topics: Emerging Tech

Post tags: Commentary