©  Raju Kumar Mishra 2018
Raju Kumar MishraPySpark Recipeshttps://doi.org/10.1007/978-1-4842-3141-8_6

6. I/O in PySpark

Raju Kumar Mishra1 
(1)
Bangalore, Karnataka, India
 

File input/output (I/O) operations are an integral part of many software activities and for data

A data scientist deals with many types of files, including text files, comma-separated values (CSV) files, JavaScript Object Notation (JSON) files, and many more. The Hadoop Distributed File System (HDFS) is a very good distributed file system.

This chapter covers the following recipes:
  • Recipe 6-1. Read a simple text file

  • Recipe 6-2. Write an RDD to a simple text file

  • Recipe 6-3. Read a directory

  • Recipe 6-4. Read data from HDFS

  • Recipe 6-5. Save an RDD to HDFS

  • Recipe 6-6. Read data from a sequential ...

Get PySpark Recipes: A Problem-Solution Approach with PySpark2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.