Using appropriate Writable types
Hadoop uses custom datatype serialization/RPC mechanism and defines its own box type classes. These classes are used to manipulate strings (
Text), integers (
IntWritable), and so on, and they implement the
Writable class, which defines a deserialization protocol.
values in Hadoop are
Writable type objects and all
keys are instances of
WritableComparable, which defines a sort order, because they need to be compared.
Writable objects are mutable and considerably more compact as no meta info needs to be stored (class name, fields, super classes, and so on), and straightforward random access gives higher performance. As binary
Writable types will take up less space, this will reduce the size of intermediate ...