Ethics in data project design: It’s about planning

The destination and rules of the road are clear; the route you choose to get there makes a huge difference.

By Anna Lauren Hoffmann
September 21, 2016
Lickskillet Road, unincorporated Boulder County. Lickskillet Road, unincorporated Boulder County. (source: Image copyright: Tim McGovern; used with permission.)

When I explain the value of ethics to students and professionals alike, I refer it as an “orientation.” As any good designer, scientist, or researcher knows, how you orient yourself toward a problem can have a big impact on the sort of solution you develop—and how you get there. As Ralph Waldo Emerson once wrote, “perception is not whimsical, but fatal.” Your particular perspective, knowledge of, and approach to a problem shapes your solution, opening up certain paths forward and forestalling others.

Data-driven approaches to business help optimize measurable outcomes—but the early planning of a project needs to account for the ethical (and in many cases, the literal) landscape to avoid ethically treacherous territory. Several recent cases in the news illustrate this point and show the type of preparation that enables a way to move forward in both a data-driven and ethical fashion: Princeton Review’s ZIP-code-based pricing scheme, which turned out to unfairly target Asian-American families, and Amazon’s same-day-delivery areas, which neglect majority-Black neighborhoods.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

You can approach a new project using a road trip analogy. The destination is straightforward—profit, revenue, or another measurable KPI. But the path you take to get there will need to be determined. If my wife and I, for example, want to drive from our apartment in Oakland (Point A) to visit my wife’s sister in Los Angeles (Point B), we have to figure out how we’d like to approach the trip. If we’re concerned primarily with efficiency, certain questions immediately come to the fore, namely: what’s the fastest route to LA? Determining the fastest route requires us to pay attention to certain features of the possible trip, such as traffic speeds, easily accessible gas stations, and traffic conditions.

On the other hand, if my wife and I are interested in taking the most scenic route from Oakland to LA, a whole different set of concerns become salient. Gas stations are likely still relevant, but speed is less of a factor. We’ll also want to take into account things like notable landmarks and towns (and my tendency toward car sickness) along the way.

The destination is the same; the laws we have to abide by are the same, but how we get from Point A to Point B, then, is heavily determined by how we orient ourselves toward the trip in the first place.

The same goes for research or design projects: how you orient yourself or your team toward solving certain problems or achieving certain goals will fundamentally shape the journey you take. If you’re interested in reaching a goal as quickly as possible—if your only concern is speed or turnaround time—a particular set of concerns are going to be salient. But if you’re interested in reaching a goal not only efficiently but ethically, then a different set of concerns will pop up.

Moreover, understanding ethics as a way of orienting yourself toward a problem helps differentiate ethical behavior from mere legal compliance; just as the laws governing driving constrain my possible actions but don’t fully determine the contours of my road trip, nor do the laws or regulations governing data collection, storage, and use exhaust the possible decisions you and your team may face.

My students lovingly refer to this as “Anna’s red flags” approach—teaching folks to see and address red flags they’d otherwise miss along the way. Of course, orienting yourself toward your work with an eye to ethics is only a starting point. Simply intending to do ethical data science isn’t enough; once you’ve established an orientation, you need information and specialized training.

In the case of The Princeton Review, it’s clear that the program designers had a suspicion that they could charge higher prices to customers in wealthier areas. Similarly, the city maps showing Amazon’s same-day delivery areas are immediately recognizable to residents of those cities as showing where people of color live. An awareness of the distribution of wealth and race in the U.S. would have set off alarm bells in either case—but this requires asking at the beginning of a research query “Are we about to build a proxy for race and class with this model?” 

To return to the road trip analogy, since both my wife and I are Midwestern transplants, we don’t have a deep background and knowledge of California to help guide our journey. Instead, we have to ask for help from folks with more extensive expertise in the space between the Bay Area and Southern California.

Similarly, you may have some ideas about data ethics already. You and your team may have even discussed the topic, but you likely don’t have deep knowledge in applied ethics of data or technology, it’s not enough to rely only on your own (necessarily limited) perspective to guide you. Instead, you need to be proactive in reaching out to consult experts in the field, taking advantage of training opportunities when available, and diversifying the range of voices informing your work.

Post topics: Building a data culture