Processing JSON data in Hive using JSON SerDe

These days, JSON is a very common data structure that's used for data communication and storage. Its key value-based structure gives great flexibility in handling data. In this recipe, we are going to take a look at how to process data stored in the JSON format in Hive. Hive does not have any built-in support to handle JSON, so we will be using JSON SerDe. SerDe is a program that consists of a serializer and deserializer, which tell Hive how to read and write data.

Getting ready

To perform this recipe, you should have a running Hadoop cluster with the latest version of Hive installed on it. Here, I am using Hive 1.2.1. Apart from Hive, we also need JSON SerDe.

There are various JSON SerDe binaries available ...

Get Hadoop: Data Processing and Modelling now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.