6

Processing Data in Machine Learning Systems

We talked about data in Chapter 3, where we introduced the types of data that are used in machine learning systems. In this chapter, we’ll dive deeper into ways in which data and algorithms are entangled. We’ll talk about data in generic terms, but in this chapter, we’ll explain what kind of data is needed in machine learning systems. I’ll explain the fact that all kinds of data are used in numerical form – either as a feature vector or as more complex feature matrices. Then, I’ll explain the need to transform unstructured data (for example, text) into structured data. This chapter will lay the foundations for diving deeper into each type of data, which is the content of the next few chapters.

Get Machine Learning Infrastructure and Best Practices for Software Engineers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.