For the last few years, AI has been almost synonymous with deep learning (DL). We’ve seen AlphaGo touted as an example of deep learning. We’ve seen deep learning used for naming paint colors (not very successfully), imitating Rembrandt and other great painters, and many other applications. Deep learning has been successful in part because, as François Chollet tweeted, “you can achieve a surprising amount using only a small set of very basic techniques.” In other words, you can accomplish things with deep learning that don’t require you to become an AI expert. Deep learning’s apparent simplicity--the small number of basic techniques you need to know--makes it much easier to “democratize” AI, to build a core of AI developers that don’t have Ph.D.s in applied math or computer science.
But having said that, there’s a deep problem with deep learning. As Ali Rahimi has argued, we can often get deep learning to work, but we aren’t close to understanding how, when, or why it works: “we’re equipping [new AI developers] with little more than folklore and pre-trained deep nets, then asking them to innovate. We can barely agree on the phenomena that we should be explaining away.” Deep learning’s successes are suggestive, but if we can’t figure out why it works, its value as a tool is limited. We can build an army of deep learning developers, but that won’t help much if all we can tell them is, “Here are some tools. Try random stuff. Good luck.”
However, nothing is as simple as it seems. The best applications we’ve seen to date have been hybrid systems. AlphaGo wasn’t a pure deep learning engine; it incorporated Monte Carlo Tree Search, and at least two deep neural networks. At O’Reilly’s New York AI Conference in 2017, Josh Tenenbaum and David Ferrucci sketched out systems they are working on, systems that combine deep learning with other ideas and methods. Tenenbaum is working with one-shot learning, imitating the human ability to learn based on a single experience, and Ferrucci is working on building cognitive models that enable machines to understand human language in a meaningful way, not just pattern matching. DeepStack’s poker playing system combines neural networks with counterfactual regret minimization and heuristic search.
Adding structure to improve models
The fundamental idea behind deep learning is very simple: deep learning systems are neural networks with several hidden layers. Each neuron is very simple: it takes a number of inputs from previous layers, combines them according to a set of weights, and produces an output that’s passed to the next layer. The network doesn’t really care whether it’s processing images, text, or telemetry. That simplicity, though, is a hint that we’re missing out on a lot of structure that’s inherent in data. Images and texts aren’t the same; they’re structured differently. Languages have a lot of internal structure. As the computational linguist Chris Manning says:
I think the current era where everyone touts this mantra of fast GPUs, massive data, and these great deep learning algorithms has ... sent computational linguistics off-track. Because it is the case that if you have huge computation and massive amounts of data, you can do a lot ... with a simple learning device. But those learners are extremely bad learners. Human beings are extremely good learners. What we want to do is build AI devices that are also extremely good learners. ... The way to achieve those learners is to put much more innate structures.
If we’re going to make AI applications that understand language as well as humans do, we will have to take advantage of the structures that are in language. From that standpoint, deep learning has been a fruitful dead end: it’s a shortcut that has prevented us from asking the really important questions about how knowledge is structured. Gary Marcus makes an argument that’s even more radical:
There is a whole world of possible innate mechanisms that AI researchers might profitably consider; simply presuming by default it is desirable to include little or no innate machinery seems, at best, close-minded. And, at worst, an unthinking commitment to relearning everything from scratch may be downright foolish, effectively putting each individual AI system in the position of having to recapitulate a large portion of a billion years of evolution.
Deep learning began with a model that was, at least in principle, based on the human brain: the interconnection of neurons, and the ancient notion that human brains start out as a blank slate. Marcus is arguing that humans are born with innate abilities which are still very poorly understood--for example, the ability to learn language, or the ability to form abstractions. For AI to progress beyond deep learning, he suggests that researchers must learn how to model these innate abilities.
There are other paths forward. Ben Recht has written a series of posts sketching out how one might approach problems that fall under reinforcement learning. He is also concerned with the possibility that deep learning, as practiced today, promises more than it can deliver:
If you read Hacker News, you’d think that deep reinforcement learning can be used to solve any problem. ... I personally get suspicious when audacious claims like this are thrown about in press releases, and I get even more suspicious when other researchers call into question their reproducibility.
Recht argues for taking a comprehensive view, and reviews the possibility for augmenting reinforcement learning with techniques from optimal control and dynamical systems. This allows RL models to benefit from research results and techniques used in many real-world applications. He notes:
By throwing away models and knowledge, it is never clear if we can learn enough from a few instances and random seeds to generalize.
AI is more than machine learning
As Michael Jordan pointed out in a recent post, what is called AI is often machine learning (ML). As someone who organizes AI conferences, I can attest to this: many of the proposals we receive are for standard machine learning applications. The confusion was inevitable: when calling a research project “artificial intelligence” was hardly respectable, we used the term “machine learning.” ML became a shorthand for “the parts of AI that work.” These parts, up to and including deep learning, were basically large-scale data analysis. Now that the tides of buzz have shifted, and everyone wants AI, machine learning applications are AI again.
But a full-fledged AI application, such as an autonomous vehicle, requires much more than data analysis. It will require progress in many areas that go well beyond pattern recognition. To build an autonomous vehicle and other true AI applications, we will need significant advances in sensors and other hardware; we will need to learn how to build software for “edge devices,” which includes understanding how to partition problems between the edge devices and some kind of “cloud”; we will need to develop infrastructure for simulation and distributed computation; and we will need to understand how to craft the user experience for truly intelligent devices.
Jordan highlights the need for further research in two important areas:
Intelligence augmentation (IA): Tools that are designed to augment human intelligence and capabilities. These include search engines (which remember things we can’t), automated translation, and even aids for artists and musicians. These tools might involve high-level reasoning and thought, though current implementations don’t.
Intelligent infrastructure (II): Jordan defines II as “a web of computation, data, and physical entities exist that make human environments more supportive, interesting and safe.” This would include networks to share medical data safely, systems to make transportation safer (including smart cars and smart roads), and many other applications. Intelligent infrastructure is about managing flows of data in ways that support human life.
What's most important about Jordan's argument, though, is that we won't get either IA or II if we focus solely on human-imitative AI. The former are inherently multidisciplinary, and require going beyond the perspective of a single agent learning to map inputs to outputs. Such a perspective, and its current implementation with deep learning, will inevitably be part of the solution, but just as inevitably, it won’t be the whole solution.
Researchers from many institutions are building tools for creating the AI applications of the future. While there is still a lot of work to be done on deep learning, researchers are looking well beyond DL to build the next generation of AI systems. UC Berkeley's RISE Lab has sketched out a research agenda that involves systems, architectures, and security.
Ameet Talwalkar’s recent post lists a number of research directions that should benefit industrial machine learning platforms. Industrial machine learning will have to meet system requirements, such as memory limitations, power budgets, and hard real time; they must be easy to deploy and to update, particularly since data models tend to grow stale over time; and they must be safe. Humans must understand how applications make decisions, along with the likely consequences of those decisions. These applications must take ethics into account.
These are all requirements for Jordan’s intelligent infrastructure. Over the past few years, we’ve seen many examples of machine learning put to questionable purposes, ranging from setting bail and determining prison sentences to targeted advertising, emotional manipulation, and the spreading of misinformation, that point us to a different set of needs. The research agenda for AI needs to take into account fairness and bias, transparency, privacy and user control over data, and the models built from that data. These issues encompass everything from ethics to design: getting informed consent, and explaining what that consent means, is not a trivial design problem. We’re only starting to understand how these disciplines connect to research in artificial intelligence. Fortunately, we’re seeing increasing interest within the data community in connecting ethics to practice. Events like the Data For Good Exchange (D4GX), the Conference on Fairness, Accountability, and Transparency (FAT*), and others are devoted to data ethics.
Talwalkar notes that air travel didn’t become commonplace until nearly 50 years after the Wright Brothers. While they were the first to achieve flight, many more developments were needed to make flying safe, inexpensive, and convenient. We’re at a similar stage in the history of AI. We’ve made progress in a few basic areas, and what we ultimately build will no doubt be amazing. We’re currently laying the foundation for future generations of AI applications, but we aren’t there yet.
- “Toward the Jet Age of machine learning”
- “Open-endedness: The last grand challenge you’ve never heard of”
- "Language understanding remains one of AI’s grand challenges": David Ferrucci on the evolution of AI systems for language understanding
- “The machine learning paradox”
- “We need to build machine learning tools to augment machine learning engineers”
- "Building and deploying large-scale machine learning pipelines": Ben Recht on why we need primitives, pipeline synthesis tools, and most importantly, error analysis and verification.
- "How to train and deploy deep learning at scale": Ameet Talwalkar on large-scale machine learning