February 2022
Intermediate to advanced
344 pages
9h 41m
English
This chapter covers
In the previous chapter, you started with a cleaned-up version of the DC taxi data set and applied a data-driven sampling procedure in order to identify the right fraction of the data set to allocate to a held-out, test data subset. You also analyzed the results of the sampling experiments and then launched a PySpark job to generate three separate subsets of data: training, validation, and test.
This chapter takes you on a temporary detour from the DC taxi data set to prepare you to write scalable machine learning code ...
Read now
Unlock full access