CHAPTER 5 Common Predictive Modeling Techniques
The key part of data mining and in my opinion the most fun is after the data has been prepared and the data scientist is able to conduct a “model tournament.” Because of the complexity and size of most modern data mining problems, the best practice is to try a number of different modeling techniques or algorithms and a number of attempts within a particular algorithm using different settings or parameters. The reason for so many trials is that there is an element of brute force that needs to be exerted to arrive at the best answer. The number of iterations needed to achieve the best model can vary widely depending on the domain and the specific modeling challenge.
It is important to note that while this chapter is focused on predictive modeling or supervised learning, many of these techniques can be used in unsupervised approaches to identify the hidden structure of a set of data. Supervised learning has a target variable, and the methods attempt to correctly classify or predict as many of the observations as possible. Supervised learning has clear measures and assessment as to the model quality. In contrast, unsupervised learning lacks a target and therefore strong objectiveness as to measure model quality.
Each of the following sections covers a different modeling technique and has the general progression of brief history on the technique, a simple example or story to illustrate how the method can be used followed by a high-level ...
Get Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.