We now combine the aforementioned methods into a single prediction. This seems intuitively a good idea, but how can we do this in practice? Perhaps the first thought that comes to mind is that we can average the predictions. This might give decent results, but there is no reason to think that all estimated predictions should be treated the same. It might be that one is better than the others.
We can try a weighted average, multiplying each prediction by a given weight before summing it all up. How do we find the best weights, though? We learn them from the data, of course!