This section will talk about two topics that form the mathematical and computational underpinnings of much of what we've covered in this book. The goal is to help you frame novel problems in a way that makes theoretical sense and that can realistically be solved with a computer.
Maximum likelihood estimation (MLE) is a very general way to frame a large class of problems in data science:
A large fraction of machine learning classification and regression models all fall under this umbrella. They differ widely in the functional form they assume, but they all assume one at least implicitly. Mathematically, the process of “training the model” really reduces to calculating θ.
In MLE problems, we almost always assume that the different data points in X are independent of each other. That is, if there are N data points, then we assume
In practice, it is often easier to find θ that maximizes the log of the probability, rather ...