Raju Kumar MishraPySpark Recipeshttps://doi.org/10.1007/978-1-4842-3141-8_6

6. I/O in PySpark

Raju Kumar Mishra¹

(1)

Bangalore, Karnataka, India

File input/output (I/O) operations are an integral part of many software activities and for data

A data scientist deals with many types of files, including text files, comma-separated values (CSV) files, JavaScript Object Notation (JSON) files, and many more. The Hadoop Distributed File System (HDFS) is a very good distributed file system.

This chapter covers the following recipes:

Recipe 6-1. Read a simple text file
Recipe 6-2. Write an RDD to a simple text file
Recipe 6-3. Read a directory
Recipe 6-4. Read data from HDFS
Recipe 6-5. Save an RDD to HDFS
Recipe 6-6. Read data from a sequential ...

Get PySpark Recipes: A Problem-Solution Approach with PySpark2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

PySpark Recipes: A Problem-Solution Approach with PySpark2 by Raju Kumar Mishra

6. I/O in PySpark

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly