Big ethics for big data

How businesses can confront the ethical issues tied to massive aggregation and data analysis.

By Howard Wen
June 10, 2012

As the collection, organization and retention of data has become commonplace in modern business, the ethical implications behind big data have also grown in importance. Who really owns this information? Who is ultimately responsible for maintaining it? What are the privacy issues and obligations? What uses of technology are ethical — or not — when it comes to big data?

These are the questions authors Kord Davis (@kordindex) and Doug Patterson (@dep923) address in “Ethics of Big Data.” In the following interview, the two share thoughts about the evolution of the term “big data,” ethics in the era of massive information gathering, and the new technologies that raise their concerns for the big data ecosystem.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

How do you define “big data”?

Douglas Patterson: The line between big data and plain old data is something that moves with the development of the technology. The new developments in this space make old questions about privacy and other ethical issues far more pressing. What happens when it’s possible to know where just about everyone is or what just about everyone watches or reads? From the perspective of business models and processes, “impact” is probably a better way to think about “big” than in terms of current trends in NoSQL platforms, etc.

One useful definition of big data — for those who, like us, don’t think it’s best to tie it to particular technologies — is that big data is data big enough to raise practical rather than merely theoretical concerns about the effectiveness of anonymization.

Kord Davis: The frequently-cited characteristics “volume, velocity, and variety” are useful landmarks — persistent features such as the size of datasets, the speed at which they can be acquired and queried, and the wide range of formats and file types generating data.

The impact, however, is where ethical issues live. Big data is generating a “forcing function” in our lives through its sheer size and speed. Recently, CNN published a story similar to an example in our book. Twenty-five years ago, our video rental history was deemed private enough that Congress enacted a law to prevent it from being shared in hopes of reducing misuse of the information. Today, millions of people want to share that exact same information with each other. This is a direct example of how big data’s forcing function is literally influencing our values.

The influence is a two-way street. Much like the scientific principle that we can’t observe a system without changing it, big data can’t be used without an impact — it’s just too big and fast. Big data can amplify our values, making them much more powerful and influential, especially when they are collected and focused toward a specific desired outcome.

Big data tends to be a broad category. How do you narrow it down?

Douglas Patterson: One way is the anonymization of datasets before they’re released publicly, acted on to target advertising, etc. As the legal scholar Paul Ohm puts it, “data can be either useful or perfectly anonymous, but never both.”

So, suppose I know things about you in particular: where you’ve eaten, what you’ve watched. It’s very unlikely that I’m going to end up violating your privacy by releasing the “information” that there’s one particular person who likes carne asada and British sitcoms. But if I have that information about 100 million people, patterns emerge that do make it possible to tie data points to particular named, located individuals.

Kord Davis: Another approach is the balance between risk and innovation. Big data represents massive opportunities to benefit business, education, healthcare, government, manufacturing, and many other fields. The risks, however, to personal privacy, the ability to manage our individual reputations and online identities, and what it might mean to lose — or gain — ownership over our personal data are just now becoming topics of discussion, some parts of which naturally generate ethical questions. To take advantage of the benefits big data innovations offer, the practical risks of implementing them need to be understood.

How do ethics apply to big data?

Kord Davis: Big data itself, like all technology, is ethically neutral. The use of big data, however, is not. While the ethics involved are abstract concepts, they can have very real-world implications. The goal is to develop better ways and means to engage in intentional ethical inquiry to inform and align our actions with our values.

There are a significant number of efforts to create a digital “Bill of Rights” for the acceptable use of big data. The White House recently released a blueprint for a Consumer Privacy Bill of Rights. The values it supports include transparency, security, and accountability. The challenge is how to honor those values in everyday actions as we go about the business of doing our work.

Do you anticipate friction between data providers (people) and data aggregators (companies) down the line?

Douglas Patterson: Definitely. For example: you have an accident and you’re taken to the hospital unconscious for treatment. Lots of data is generated in the process, and let’s suppose it’s useful data for developing more effective treatments. Is it obvious that that’s your data? It was generated during your treatment, but also with equipment the hospital provided, based on know-how developed over decades in various businesses, universities, and government-linked institutions, all in the course of saving your life. In addition to generating profits, that same data may help save lives down the road. Creating the data was, so to speak, a mutual effort, so it’s not obvious that it’s your data. But it’s also not obvious that the hospital can just do whatever it wants with it. Maybe under the right circumstances, the data could be de-anonymized to reveal what sort of embarrassing thing you were doing when you got hurt, with great damage to your reputation. And giving or selling data down the line to aggregators and businesses that will attempt to profit from it is one thing the hospital might want to do with the data that you might want to prevent — especially if you don’t get a percentage.

Questions of ownership, questions about who gets to say what may and may not be done with data, are where the real and difficult issues arise.

Which data technologies raise ethical concerns?

Douglas Patterson: Geolocation is huge — think of the flap over the iPhone’s location logging a while back, or how much people differ over whether or not it’s creepy to check yourself or a friend into a location on Facebook or Foursquare. Medical data is going to become a bigger and bigger issue as that sector catches up.

Will lots of people wake up someday and ask for a “do over” on how much information they’ve been giving away via the “frictionless sharing” of social media? As a teacher, I was struck by how little concern my students had about this — contrasted with my parents, who find something pretty awful about broadcasting so much information. The trend seems to be in favor of certain ideas about privacy going the way of the top hat, but trends like that don’t always continue.

Kord Davis: The field of predictive analytics has been around for a long time, but the development of big data technologies has increased accessibility to large datasets and the ability to data mine and correlate data using commodity hardware and software. The potential benefits are massive. A promising example is that longitudinal studies in education can gather and process significantly more minute data characteristics and we have no idea what we might learn. Which is precisely the point. Being able to assess a more refined population of cohorts may well turn out to unlock powerful ways to improve education. Similar conditions exist for healthcare, agriculture, and even being able to predict weather more reliably and reducing damage from catastrophic natural weather events.

On the other hand, the availability of larger datasets and the ability to process and query against them makes it very tempting for organizations to share and cross-correlate to gain deeper insights. If you think it’s difficult to identify values and align them with actions within a single organization, imagine how many organizations the trail of your data exhaust touches in a single day.

Even a simple, singular transaction, such as buying a pair of shoes online touches your bank, the merchant card processor, the retail or wholesale vendor, the shoe manufacturer, the shipping company, your Internet service provider, the company that runs or manages the ecommerce engine that makes it possible, and every technology infrastructure organization that supports them. That’s a lot of opportunity for any single bit of your transaction to be stored, shared, or otherwise mis-used. Now imagine the data trail for paying your taxes. Or voting — if that ever becomes widely available.

What recent events point to the future impact of big data?

Douglas Patterson: For my money, the biggest impact is in the funding of just about everything on the web by either advertising dollars or investment dollars chasing advertising dollars. Remember when you used to have to pay for software? Now look at what Google will give you for free, all to get your data and show you ads. Or, think of the absolutely pervasive impact of Facebook on the lives of many of its users — there’s very little about my social life that hasn’t been affected by it.

Down the road there may be more Orwellian or “Minority Report” sorts of things to worry about — maybe we’re already dangerously close now. On the positive side again, there will doubtless be some amazing things in medicine that come out of big data. Its impact is only going to get bigger.

Kord Davis: Regime change efforts in the Middle East and the Occupy Movement all took advantage of big data technologies to coordinate and communicate. Each of those social movements shared a deep set of common values, and big data allowed them to coalesce at an unprecedented size, speed, and scale. If there was ever an argument for understanding more about our values and how they inform our actions, those examples are powerful reminders that big data can influence massive changes in our lives.

This interview was edited and condensed.

Related:

Post topics: Data
Share: