P. SinghMachine Learning with PySparkhttps://doi.org/10.1007/978-1-4842-7777-5_2

2. Manage Data with PySpark

Pramod Singh¹

(1)

Bangalore, Karnataka, India

In the previous chapter, we looked at the core strength of the Spark framework and the process to use it in different ways. This chapter focuses on how we can use PySpark to handle data. In essence, we would apply the same steps when dealing with a huge set of data points; but for demonstration purposes, we will consider a relatively small sample of data. As we know, data ingestion, cleaning, and processing are supercritical steps for any type of data pipeline before data can be used for Machine Learning ...

Get Machine Learning with PySpark: With Natural Language Processing and Recommender Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Machine Learning with PySpark: With Natural Language Processing and Recommender Systems by Pramod Singh

2. Manage Data with PySpark

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly