©  Raju Kumar Mishra 2018
Raju Kumar MishraPySpark Recipeshttps://doi.org/10.1007/978-1-4842-3141-8_6

6. I/O in PySpark

Raju Kumar Mishra1 
(1)
Bangalore, Karnataka, India
 

File input/output (I/O) operations are an integral part of many software activities and for data

A data scientist deals with many types of files, including text files, comma-separated values (CSV) files, JavaScript Object Notation (JSON) files, and many more. The Hadoop Distributed File System (HDFS) is a very good distributed file system.

This chapter covers the following recipes:
  • Recipe 6-1. Read a simple text file

  • Recipe 6-2. Write an RDD to a simple text file

  • Recipe 6-3. Read a directory

  • Recipe 6-4. Read data from HDFS

  • Recipe 6-5. Save an RDD to HDFS

  • Recipe 6-6. Read data from a sequential ...

Get PySpark Recipes: A Problem-Solution Approach with PySpark2 now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.