Harvesting and storing data

Before delving into database persistent storage such as MongoDB, we will look at some useful file storages that are widely used: CSV (short for comma-separated values) and JSON (short for JavaScript Object Notation) file storage. The enduring popularity of these two file formats lies in a few key reasons: they are human readable, simple, relatively lightweight, and easy to use.

Persisting data in CSV

The CSV format is lightweight, human readable, and easy to use. It has delimited text columns with an inherent tabular schema.

Python offers a robust csv library that can serialize a csv file into a Python dictionary. For the purpose of our program, we have written a python class that manages to persist data in CSV format ...

Get Spark for Python Developers now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.