The Strata Data Conference in San Francisco was filled with speakers talking about opportunity. But those opportunities were balanced against risks—risks that loom large as we discover more powerful ways to apply data using machine learning and artificial intelligence. It's a necessary tension we'll need to understand as we continue on the journey into the age of data.
Cloudera's merger with HortonWorks demonstrates some of the opportunities. They "drank their own champagne" (a metaphor preferable to eating dog food) by using machine learning to merge the two companies: clustering similar customers, predicting sales opportunities, and integrating the two teams.
In his keynote, program co-chair Ben Lorica gave excellent advice for organizations that are just starting on the road to machine learning: companies that have been successful with machine learning have either built on existing data products or services, or used machine learning to modernize existing applications. Companies that attempt to make a leap into the void, working with data and services they don’t understand well, will have a rough time. Machine learning grows out of your current data practices. It may be revolutionary, but if you haven’t prepared for the revolution by developing your data sources, learning how to clean your data, preparing for data governance, and more, you’ll inevitably fall behind. Fortunately, there are tools—both open source and commercial—to help in all these areas.
Some of the most important opportunities are for democratizing data: not just making data accessible, but making it usable by everyone in the organization, even those without programming skills. Jeremy Howard's session showed how a subject expert with no prior programming knowledge can make an AI application. Howard told me about a dermatologist who has built an application that classifies burns. (He also recommended against watching the demo before lunch.) Efforts like this are key to building AI systems that create a better world. Emergency responders need tools that assist them in the field, tools that can be built into their phones, and let them make decisions without waiting for an MD.
According to Mike Olson, the most important thing we've learned from cloud computing is that "easy seriously matters." Easy doesn't just mean you can pay for computing with your credit card, or add and subtract servers at a moment's notice. And it doesn't just mean providing good tools for analytics. Easy applies to every aspect of computing, particularly self-service data. Easy means making tools for building data pipelines that don't care where the data is physically located (in a data center or the cloud), and that understand regulations governing that data and how it is used, and that make data accessible without requiring programming skills. These are tools that can be used by anyone, not just engineers and data analysts: managers, executives, and sales and marketing folks.
Moving data and computing to the cloud remains a tremendous opportunity. We're still in the early days of cloud computing: many companies that could move their data to the cloud haven't yet done so. Jordan Tigani of Google talked about the many opportunities the cloud represents, starting with decoupling data storage from computation, reducing administrative overhead, building real-time pipelines, eliminating silos, and enabling access for all users. All these benefits flow naturally from moving data to the cloud and relying on the scale of infrastructure that only cloud providers give you.
What about the risks? Several speakers, including Peter Singer and David Sanger, talked about the dangers of an increasingly militarized network. Peter Singer said: "There is no silver bullet. There will continue to be marketing, politics, wars, all taking place online. We need new strategies for dealing with it." These dangers increase as our tools become more powerful; Singer said that we can look forward to "deep face" (fake videos), and Elizabeth Svoboda discussed how neuroscience is already used to construct political messages that trigger fear responses.
We also heard about progress toward meeting these challenges. Shafi Goldwasser challenged developers to create “Safe ML”: machine learning that can’t be abused. Machine learning needs to ensure privacy, both of the training data and the model, and needs to be fair and invulnerable to tampering. The tools we need to create Safe ML have been under development among cryptographers for the past 30 years, well before modern machine learning became practical. The challenge facing machine learning developers is taking these tools—federated learning, multiparty cryptography, homomorphic encryption, and differential privacy—and putting them to use. Her points were echoed in several other sessions throughout the conference.
At the ethics summit, participants discussed the many problems in building software systems ethically. There are clearly dangers here: hardly a day goes by without news of data abuse. But perhaps the most interesting discussion was whether ethics is a zero sum game or a business opportunity. Does treating customers fairly and respecting their individuality and their privacy represent an opportunity? There are a lot of things you can say about Amazon's business practices, but almost nobody criticizes the ease with which you can return merchandise. What other opportunities are there? Many customers have become cynical, and expect to be treated badly; too few companies have thought seriously about using data to make their customers' lives better. That may be changing.
These themes were echoed in the Future of the Firm track, which focused on rethinking the corporation for the digital era. The future isn’t just about “implementing AI,” but about building organizations that work better: that support their employees’ training needs, that listen to their employees on ethical issues, that take a human-centered approach to AI. The future of the firm is about taking advantage of data—but it’s about taking advantage of data to build a better future for customers, employees, and investors.
Putting data to work is an opportunity; we’ve been making that point since the first Strata conference. The risks of a hostile, militarized network are real. But the opportunities—for corporations, for employees, for customers—are far greater.