Working with Hive
Hive is a data warehousing infrastructure based on Hadoop. Hive provides SQL-like capabilities to work with data on Hadoop. Hadoop, during its infancy was limited to MapReduce as a computer platform, which was a very engineer-centric programming paradigm. Engineers at Facebook in 2008 were writing fairly complex Map-Reduce jobs, but realised that it would not be scalable and it would be difficult to get the best value from the available talent. Having a team that could write Map-Reduce Jobs, and be called upon was considered a poor strategy and hence the team decided to bring SQL to Hadoop (Hive) due for two major reasons:
- An SQL-based declarative language while allowing engineers to plug their own scripts and programs when SQL ...
Get Learning Apache Spark 2 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.