O'Reilly logo

Hadoop in Practice, Second Edition by Alex Holmes

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3. Data serialization—working with text and beyond

This chapter covers

  • Working with text, XML, and JSON
  • Understanding SequenceFile, Avro, Protocol Buffers, and Parquet
  • Working with custom data formats

MapReduce offers straightforward, well-documented support for working with simple data formats such as log files. But MapReduce has evolved beyond log files to more sophisticated data-serialization formats—such as text, XML, and JSON—to the point where its documentation and built-in support runs dry. The goal of this chapter is to document how you can work with common data-serialization formats, as well as to examine more structured serialization formats and compare their fitness for use with MapReduce.

Imagine that you want to work with ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required