O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Extracting useful features from your data

Once we are done with the cleaning of our data, we are ready to get down to the business of extracting actual features from the data, with which our machine learning model can be trained.

Features refer to the variables that we use to train our model. Each row of data contains information that we would like to extract into a training example.

Almost all machine learning models ultimately work on numerical representations in the form of a vector; hence, we need to convert raw data into numbers.

Features broadly fall into a few categories, which are as follows:

  • Numerical features: These features are typically real or integer numbers, for example, the user age that we used in an example earlier.
  • Categorical ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required