July 2018
Intermediate to advanced
334 pages
8h 20m
English
There is a purpose to preprocessing. In most data analytics tasks, the question that begs to be asked is—is our data necessarily usable? The answer lies in the fact that most real-world datasets need preprocessing, a massaging step meant to give data a new usable form.
With the spam and ham datasets, we identify two important preprocessing steps:
In the next step, we will write Scala code for two regular expressions, expressions that are fairly simple and only address a small subset of spam. However, it's a start.
In the next step, we will load our datasets into Spark. Naturally, we want a ham ...
Read now
Unlock full access