2 Getting started with the data set

This chapter covers

  • Introducing a use case for machine learning
  • Starting with object storage for serverless machine learning
  • Using crawlers to automatically discover structured data schemas
  • Migrating to column-oriented data storage for more efficient analytics
  • Experimenting with PySpark extract-transform-load (ETL) jobs

In the previous chapter, you learned about serverless machine learning platforms and some of the reasons they can help you build a successful machine learning system. In this chapter, you will get started with a pragmatic, real-world use case for a serverless machine learning platform. Next, you are asked to download a data set of a few years’ worth of taxi rides from Washington, DC, to build ...

Get MLOps Engineering at Scale now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.