Exploring the permeable border between neuroscience, cognitive science, and AI with self-styled “neurotechnologist” Adam Marblestone.
Adam Marblestone is the director of scientific architecting within the Synthetic Neurobiology Group at MIT Media Lab. Prior to that, he explored the design of scalable biological interfaces and the principles behind cognition in the cortex at Harvard.
The worlds of AI and neuroscience are converging due to the increasing sophistication of AI models and the “shedding of assumptions” within the neuroscience community about what the brain can do and how it does it.
Creating computers that can perform feats of unsupervised learning might require us to first create a learning system that contains a number of self-supervising learning functions, dictated by biases baked into the system. This is similar to the way that children appear to have a bias in their brains for spotting hands, which lets them ultimately learn more complex visual elements.
The brain uses a fundamentally different model of memory than that implemented in neural network approaches like the Neural Turing Machine.
Jack: Why are the two fields, AI and neuroscience, coming closer together?
Adam: In addition to the progress in neural network-based AI, there has been a shedding of assumptions within neuroscience itself that makes room for such connections.
But a perhaps less-obvious trend comes from the removal of assumptions researchers have had about how the brain might possibly work. Go back to the 1980s and think about the mechanisms of training neural networks, like backpropagation. When that stuff came out, the neuroscientists said, “this is totally unlike what the brain does.” For example, it requires one variable that codes for the activation of the neuron, and another variable that codes for the error signal that goes to that neuron, so it requires information to be flowing in multiple directions. They said, “neurons don’t do that, they have only the axon as an output, which goes in only one direction.” So, there were a lot of assumptions that restricted what kinds of models neuroscientists considered to be “biologically plausible.”
Since then, empirical neuroscience has gone through a series of intellectual changes. In the 1990s it moved into “molecular neuroscience,” revealing a vast amount of complexity of the molecular machinery at the synapse and in the cell, and in the 2000s it talked about “circuit neuroscience,” where a wide variety of cell types’ complex modes of interaction between neurons came to light.
There has been this huge deepening since then in what people understand about the actual physical structure of the neuron itself and the complexity of the neural circuitry, which has caused people to go beyond what you could call a “cartoon model” of neurons. Before you have detailed knowledge of the biophysics of neurons, you’re tempted to think of them as perfect little spheres that compute a result by taking a sum of their inputs and applying a simple threshold, but it turns out they have a huge amount of internal state. They compute molecularly, they have molecular states, gene expression, multiple electrical subcompartments in the dendritic tree, and they can execute very specific internal learning rules. So, the individual neuron is much more complex than we thought.
This new window on the complexity of neurons and circuits means that traditional arguments against some sort of generic and powerful optimization or learning process, like backpropagation, should go away. You can think of what’s happening in deep learning as being founded on this almost universal algorithm of backpropagation, or gradient descent, for optimization, whether it be supervised or unsupervised or reinforcement based—backpropagation is being taken as a black box and applied to a wide range of architectures.
So, there’s a universal and powerful way of learning, which is backpropagation, and neuroscientists had thought the brain “can’t possibly” be doing that. What computational neuroscientists postulated instead were relatively random networks, with local learning rules, which are weaker, and these models were inspired by statistical physics as much as by what really worked computationally.
In summary, it turns out that neurons and circuits are more complex than we thought, and my take is that this means that traditional arguments against the brain doing learning algorithms at least as powerful and generic as backpropagation were wrong. In the past, and even now, models from theoretical neuroscience haven’t been based on the ground truth circuitry, because we didn’t know what the ground truth circuity actually looked like in detail. They’ve tended to assume random circuity, and they’ve tended to assume relatively simple learning rules that are not that powerful compared to what machine learning has chosen to make use of. As we start to question those assumptions and find that these circuits in the brain are nonrandom, quite complicated, capable of possessing information with internal states, and so on, this opens up the possibility that the brain can do at a minimum the types of things machine learning was doing in terms of optimization, if not even more clever things.
Jack: What clues does the brain hold for things like unsupervised learning?
Adam: One assumption that people tend to make is that the cerebral cortex is some kind of unsupervised learning system. I think this may not be quite the right way to look at it.
It’s not necessarily a clue from looking at the brain circuitry, but it’s a clue from how biology tends to work—unsupervised learning, as such, may not be quite the right thing to go for.
A way to think about it is that, with the way people talk about unsupervised learning now, you have an algorithm that can extract structure from any data stream, any arbitrary datastream whatsoever, whether it be stock prices or weather forecasting or anything, and that algorithm is going to be doing the heavy lifting. But in contrast, in the brain, with billions of years of evolution, what you have is the opportunity to build in things specifically for the inputs that matter for humans or for a particular biological organism. The datastreams are not generic or arbitrary, and neither should the brain’s algorithms be equipotent for arbitrary types of data. The brain is allowed to make some specific assumptions, built in by evolution, about the world it is going to grow up in.
Certain problems are probably hard for an unsupervised learner to solve. Shimon Ullman’s work at MIT has given a fantastic example of this. One of the problems they talk about is the notion of detecting hands—a person’s hands—and the direction of gaze of a person’s face. You can imagine human beings have to learn certain very-specific things, in a certain order, to grow up successfully in the human social environment. In order to learn language or complicated behavior, you need to learn from other humans, and to do that, you need to learn where their hands are and whether they are looking at you right now. You can imagine a baby has to solve not just a general unsupervised learning task, but they have to ask very specific questions, find the faces, find the hands, figure out where the eyes are looking, find out if my mother is talking to me, and so on.
If we take the example of hands, you need a simple classification algorithm where you’re trying in more or less an unsupervised way to find where the hands are. What Shimon Ullman found is that it’s not easy to find the hands if you are just doing unsupervised clustering, but it’s possible to have some prior knowledge that could have been built into the brain by evolution, like that hands tend to approach objects in a certain way, and that there’s a particular pattern of motion that is characteristic of hands. It turns out that that motion could be detected by a specific algorithm via a specific kind of calculation of so-called “optical flow.” So, this is not generic unsupervised learning, but highly specific, prebiased learning. You can imagine this could be built in to retinal areas to help the baby form a deductive bias.
So, it is possible that you may have a bunch of supervised learning tasks that are self-supervised using clever tricks or cues built in by evolution. You start by having the brain create some simple biases, like identifying hands. This is a little different from the way the AI people have done supervised learning, where it is often meant to be totally independent of the precise problem you are solving. This is not to say that we don’t have highly generic learning processes in the brain, and that some of it is supervised by very generic kinds of signals like “prediction of the next input.” But this is different from what one might call “unsupervised pattern classification” and is likely to include much more specialized mechanisms being unfolded in a specific order to bootstrap off of one another during development.
Jack: What sort of fruitful overlaps between AI and neuroscience do you see developing in the coming years?
Adam: There are two interesting ones that come to mind, although many others exist and more will likely emerge. One is finding out if the brain does, in fact, do backpropagation, or something quite like backpropagation. We know a lot of the brain is driven by reward signals, dopamine, and so forth, but nobody knows whether the brain really does this generic kind of powerful multilayer optimization that backpropagation enables. Does the architecture of the cortex support backpropagation? If so, then where are the error signals? Researchers like Jim DiCarlo at MIT are starting to look for them.
Concretely, some scientists, David Sussillo, for example, take an artificial neural network of a kind meant to be relatively well mapped to the brain, and train it under some assumptions about how a particular area of the brain works, using backpropagation. Then, they analyze the neurons in the brain and look for the same dynamics. They are asking: does the brain do the kinds of things it would do if it were optimized by learning for a particular task, given a particular kind of basic network structure? It remains to be seen how far this can go. There’s a very interesting question of how far you can push that kind of research, because if you can push it far, it would bolster this idea at some fundamental level that the brain is doing optimization. As we look at the connections between neuroscience and AI, empirically you might start to see that although one is optimized by learning processes and built-in heuristics shaped by evolution, and the other by a programmer running TensorFlow or something, they are converging on similar dynamics. They are both optimized to do the same thing. That would be encouraging progress.
The second really interesting area has to do with these memory systems. If you look at deep learning, memory becomes fundamental to these questions of variable binding: how planning is done, simulation of future states of the world, and simulation of the effects of actions. All these tasks require selective access to different types of memories, basically pulling up the right memories at the right times.
How the hippocampus, through its interaction with the cortex, actually stores different kinds of memories is therefore a very interesting question. There’s a lot of very interesting biophysics going on in the hippocampus, and some of it may be relevant to these memory computations. There are waves of oscillations called theta oscillations, and the neurons are also coordinating their firing with this global oscillation, via relative timings of the firings. The neurons are potentially even compressing series of events that occur into specific subcycles of these waves, possibly leading to a way to represent temporal linkages between events, or, in other words, “first this happened, then that happened,” or “if this happened, it is likely that will happen next.”
If we had a better understanding of how the brain represents sequences of events and actions, then from my limited understanding, it seems it would be a key insight for resolving a lot of the issues in AI right now, like hierarchical reinforcement learning. That’s because we carry out actions that occur on multiple timescales. If you want to travel from Boston to San Francisco, you have a whole series of things you have to do. A human would think: I should go to the airport, but first I need to get in the car, but before that I need to move my muscles so I can walk to the car. How is that kind of flexible interface between actions at different levels of abstraction represented, and how do we interpolate between long-range plans and what we need to do now to get them done? If we understood the representation of temporal sequences of events and actions in memory in the brain, that might be helpful, but of course we can’t know for sure!