Adapting ideas from neuroscience for AI
Inspiration from the brain is extremely relevant to AI; it’s time we pushed it further.
Inspiration from the brain is extremely relevant to AI; it’s time we pushed it further.
A better understanding of the reasons why neurons spike could lead to smart AI systems that can store more information more efficiently, according to Geoff Hinton, who is often referred to as the “godfather” of deep learning.
Geoff Hinton is an emeritus distinguished professor at the University of Toronto and an engineering fellow at Google. He is one of the pioneers of neural networks, and was part of the small group of academics that nursed the technology through a period of tepid interest, funding, and development.
Jack Clark: Why should we look at the brain when developing AI systems, and what aspects should we focus on?
Geoff Hinton: The main reason is that it’s the thing that works. It’s the only thing we know that’s really smart and has general purpose intelligence. The second reason is that, for many years, a subset of people thought you should look at the brain to try to make AI work better, and they didn’t really get very far—they made a push in the 80s, but then it got stalled, and they were kind of laughed at by everyone in AI saying “you don’t look at a bumblebee to design a 747.” But it turned out the inspiration they got from looking at the brain was extremely relevant, and without that, they probably wouldn’t have gone in that direction. It’s not just that we have an example of something that’s intelligent; we also have an example of a methodology that worked, and I think we should push it further.
JC: Today, aspects of modern classifiers like neural nets look vaguely similar to what we know about the brain’s visual system. We’re also developing memory systems that are inspired by the hippocampus. Are there other areas we can look to the brain and start taking elements from, like spiking neurons?
GH: We don’t really know why neurons spike. One theory is that they want to be noisy so as to regularize, because we have many more parameters than we have data points. The idea of dropout [a technique developed to help prevent overfitting] is that if you have noisy activations, you can afford to use a much bigger model. That might be why they spike, but we don’t know. Another reason why they might spike is so they can use the analog dimension of time, to code a real value at the time of the spike. This theory has been around for 50 years, but no one knows if it’s right. In certain subsystems, neurons definitely do that, like in judging the relative time of arrival of a signal to two ears so you can get the direction.
Another area is in the kinds of memory. Synapses adapt at many different timescales and in complicated ways. At present, in most artificial neural nets, we just have a timescale for adaptation of the synapses and then a timescale for activation of the neurons. We don’t have all these intermediate timescales of synaptic adaptation, and I think those are going to be very important for short-term memory, partly because it gives you a much better short-term memory capacity.
JC: Are there any barriers to our ability to understand the brain that could slow down the rate at which we can develop ideas in AI inspired by it?
GH: I think if you stick an electrode in a cell and record from it, or put an electrode near a cell and record from it, or near a bunch of cells and try to record from half a dozen of them, then you won’t understand things that might easily be understood by optical dyes, which let you know what a million cells are doing. There’s going to be all sorts of things in the Obama initiative for brain science to give us new techniques that will allow us to see (and make obvious) things that would have been very hard to establish. We don’t know what they’re going to be, but I suspect that will lead to some interesting things.
JC: So, if we had a sufficiently large neural network, would that be able to match a human on any given task or are there missing components we need?
GH: It depends on what particular task you’re talking about. If you take something like speech recognition, I’d be very surprised if a really big network exactly matched a human being; I think it’s either going to be worse or it’s going to be better. Human beings aren’t the limit. I think actually in speech recognition, I wouldn’t be at all surprised if in 10 year’s time, neural nets can’t do it better than people. For other areas, like reasoning and learning from a very small number of examples, it may take longer to develop systems that match or surpass people.
JC: One problem modern reinforcement learning systems seem to have is knowing what parts of a problem to devote attention to exploring, so you don’t have to waste your time on less interesting parts of the image.
GH: This is exactly the same in vision. People make very intelligent fixations. Almost all of the optical array never gets processed at high resolution, whereas in computer vision, people typically just take the whole array at low-resolution, medium-resolution, high-resolution, and try to combine the information, so it’s just the same problem in us. How do you intelligently focus on things? We’re going to have to deal with the same problem in language. This is an essential problem, and we haven’t solved it yet.
JC: You recently gave a lecture on a paper you published about short-term changes of weights within neural networks. Can you explain this paper and why you think it is important?
GH: In recurrent neural networks, if they’re processing a sentence, they have to remember stuff about what has happened so far in the sentence, and all of that memory is in the activations in the hidden neurons. That means those neurons are having to be used to remember stuff, so they’re not really available for doing current processing.
A good example of this is if you have an embedded sentence—like if someone said, “John didn’t like Bill because he was rude to Mary, because Bill was rude to Mary”—you process the beginning of the sentence, then you use exactly the same knowledge processing to process “because Bill was rude to Mary.” Ideally, you want to use the same neurons and the same connections and the same weights for the connections for this processing. That’s what true recursion would be, and that means you have to take what you have so far in a sentence and you have to put it aside somewhere. The question is: how do you put it aside? In a computer, it’s easy because you have random access memory, so you just copy it into some other bit of memory to free up the memory. In the brain, I don’t think we copy neural activity patterns; what I think we do is have rapid changes to synapse strength so we can recreate the memories when we need them, and we can recreate them when the context is such that it would be appropriate.
I have a recent paper with Jimmy Ba and some people at DeepMind showing how we can make that work. I think that’s an example of where the fact that synapses are changing on multiple timescales can be useful. I first thought about this in 1973 and made a little model that could do true recursion on a very simple problem. A year ago, I went back to that at DeepMind and got it working within the framework so it learns everything. Back when I first thought about it, computers had 64k of memory and we didn’t know how to train big neural nets.
JC: Do you think AI agents need to be embodied in some form, either in a robot or sufficiently rich simulation, to become truly intelligent?
GH: I think there are two aspects, one is the philosophical aspect and the other is the practical aspect. Philosophically, I see no reason why they have to be embodied, because I think you can read Wikipedia and understand how the world works. But as a practical matter, I think embodiment is a big help. There’s a Marx phrase: “If you want to understand how the world works, try and change it.” Just looking is not as efficient a way of understanding things as acting. So, the philosophical question is: is action essential? If action is essential to understanding the world, then astrophysics is in trouble. So, no, I don’t think embodiment is necessary.
JC: If you’re able to replicate some of the properties of spiking neurons and combine that with systems that can form temporary memories, then what will you be able to build?
GH: I think it might just make all the stuff we have today work better. So, for natural language understanding, I think having an associative memory with fast changes in the weights would be helpful, and for these feedforward nets, I think coincidence detectors are much better at filtering out clutter in the background, so they’ll be much better at focusing on the signal and filtering out the noise. This could also help with learning from small data sets.