© Ramcharan Kakarla, Sundar Krishnan and Sridhar Alla 2021
R. Kakarla et al.Applied Data Science Using PySparkhttps://doi.org/10.1007/978-1-4842-6500-0_4

4. Variable Selection

Ramcharan Kakarla1  , Sundar Krishnan1 and Sridhar Alla2
(1)
Philadelphia, PA, USA
(2)
New Jersey, NJ, USA
 
Variable selection is an art. The goal of this chapter is to help you understand the different variable selection techniques that can be used to select the best features in your dataset. It is one of the key processes in data science. To put it in simple terms, let us say you are the coach for a soccer team. You want to pick the best team to win the World Cup. You need to have the best player in each position (best features), and you don’t want too many players who play ...

Get Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.