Serialization and deserialization formats and data types

Serialization and deserialization formats are popularly known as SerDes. Hive allows the framework to read or write data in a particular format. These formats parse the structured or unstructured data bytes stored in HDFS in accordance with the schema definition of Hive tables. Hive provides a set of in-built SerDes and also allows the user to create custom SerDes based on their data definition. These are as follows:

  • LazySimpleSerDe
  • RegexSerDe
  • AvroSerDe
  • OrcSerde
  • ParquetHiveSerDe
  • JSONSerDe
  • CSVSerDe

How to do it…

You can use different types of SerDes for reading or writing the data in a particular format.

LazySimpleSerDe

This is the default SerDes format of Hive. When a user creates a table in Hive without ...

Get Apache Hive Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.