5
Quantifying and Improving Data Properties
Procuring data in machine learning systems is a long process. So far, we have focused on data collection from source systems and cleaning noise from data. Noise, however, is not the only problem that we can encounter in data. Missing values or random attributes are examples of data properties that can cause problems with machine learning systems. Even the length of the input data can be problematic if it is outside of the expected values.
In this chapter, we will dive deeper into the properties of data and how to improve them. In contrast to the previous chapter, we will work on feature vectors rather than raw data. Feature vectors are already a transformation of the data and therefore, we can change ...
Get Machine Learning Infrastructure and Best Practices for Software Engineers now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.