O'Reilly logo

Optimizing Hadoop for MapReduce by Khaled Tannir

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Using appropriate Writable types

Hadoop uses custom datatype serialization/RPC mechanism and defines its own box type classes. These classes are used to manipulate strings (Text), integers (IntWritable), and so on, and they implement the Writable class, which defines a deserialization protocol.

Therefore, all values in Hadoop are Writable type objects and all keys are instances of WritableComparable, which defines a sort order, because they need to be compared.

Writable objects are mutable and considerably more compact as no meta info needs to be stored (class name, fields, super classes, and so on), and straightforward random access gives higher performance. As binary Writable types will take up less space, this will reduce the size of intermediate ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required