2 Getting started with the data set

This chapter covers

Introducing a use case for machine learning
Starting with object storage for serverless machine learning
Using crawlers to automatically discover structured data schemas
Migrating to column-oriented data storage for more efficient analytics
Experimenting with PySpark extract-transform-load (ETL) jobs

In the previous chapter, you learned about serverless machine learning platforms and some of the reasons they can help you build a successful machine learning system. In this chapter, you will get started with a pragmatic, real-world use case for a serverless machine learning platform. Next, you are asked to download a data set of a few years’ worth of taxi rides from Washington, DC, to build ...

Get MLOps Engineering at Scale now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

MLOps Engineering at Scale by Carl Osipov

2 Getting started with the data set

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly