O'Reilly logo

Apache Hive Essentials by Dayong Du

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

SerDe

SerDe stands for Serialization and Deserialization. It is the technology used to process records and map them to column data types in Hive tables. To explain the scenario of using SerDe, we need to understand how Hive reads and writes data first.

The process to read data is as follows.

  1. Data is read from HDFS.
  2. Data is processed by the INPUTFORMAT implementation, which defines the input data split and key/value records. In Hive, we can use CREATE TABLE ... STORED AS <FILE_FORMAT> (see Chapter 9Performance Considerations) to specify which INPUTFORMAT it reads from.
  3. The Java Deserializer class defined in SerDe is called to format the data into a record that maps to column and data types in a table.

For an example of reading data, we ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required