Chapter 3. Data serialization—working with text and beyond

This chapter covers

  • Working with text, XML, and JSON
  • Understanding SequenceFile, Avro, Protocol Buffers, and Parquet
  • Working with custom data formats

MapReduce offers straightforward, well-documented support for working with simple data formats such as log files. But MapReduce has evolved beyond log files to more sophisticated data-serialization formats—such as text, XML, and JSON—to the point where its documentation and built-in support runs dry. The goal of this chapter is to document how you can work with common data-serialization formats, as well as to examine more structured serialization formats and compare their fitness for use with MapReduce.

Imagine that you want to work with ...

