Processing XML data in Hive using XML SerDe
XML has been one of the most important data structures and has been used for quite a long time for data transfers and storage. Parsing XML data and then processing it is always a tricky task as parsing XML is one of the most costliest operations. Hive does not have any built-in support for XML data processing, but many organizations and individuals have made open source contributions to XML SerDe
.
Getting ready
To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Hive installed on it. Here, I am using Hive 1.2.1. Apart from Hive, we also need XML SerDe
.
There are various XML SerDe
that have been made available by open source developers. Out of these, XML SerDe ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.