Processing JSON data in Hive using JSON SerDe
These days, JSON is a very common data structure that's used for data communication and storage. Its key value-based structure gives great flexibility in handling data. In this recipe, we are going to take a look at how to process data stored in the JSON format in Hive. Hive does not have any built-in support to handle JSON, so we will be using JSON SerDe
. SerDe
is a program that consists of a serializer and deserializer, which tell Hive how to read and write data.
Getting ready
To perform this recipe, you should have a running Hadoop cluster with the latest version of Hive installed on it. Here, I am using Hive 1.2.1. Apart from Hive, we also need JSON SerDe
.
There are various JSON SerDe
binaries available ...
Get Hadoop Real-World Solutions Cookbook - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.