Transforming Your Data

Real-world datasets are very varied: variables can be textual, numerical, or categorical, and observations can be missing, false, or wrong (outliers). To perform a proper data analysis, we will understand how to correctly parse data, clean it, and create an output matrix optimally built for machine learning analysis. To extract knowledge, it is essential that the reader is able to create an observation matrix using different techniques of data analysis and cleaning.

In this chapter, we'll present Cloud Dataprep, a service useful to preprocess the data, extract features, and clean up the records. We'll also cover Cloud Dataflow, a service to implement streaming and batch processing. We'll go into some practical details ...

Get Hands-On Machine Learning on Google Cloud Platform now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.