Chapter 5: Data Engineering
Data engineering, in general, refers to the management and organization of data and data flows across an organization. It involves data gathering, processing, versioning, data governance, and analytics. It is a huge topic that revolves around the development and maintenance of data processing platforms, data lakes, data marts, data warehouses, and data streams. It is an important practice that contributes to the success of big data and machine learning (ML) projects. In this chapter, you will learn about the ML-specific topics of data engineering.
A sizable number of ML tutorials/books start with a clean dataset and a CSV file to build your model against. The real world is different. Data comes in many shapes and ...
Get Machine Learning on Kubernetes now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.