book

Apache Hive Cookbook

Name: Apache Hive Cookbook
ISBN: 9781782161080

by Hanish Bansal, Saurabh Chauhan, Shrey Mehrotra

April 2016

Beginner

268 pages

5h 32m

English

Packt Publishing

Read now

Unlock full access

Apache Hive Cookbook
Table of Contents
Apache Hive Cookbook
Credits
About the Authors
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and moreWhy Subscribe?
Preface
What this book covers
What you need for this book
Who this book is for

Sections
Getting readyHow to do it…How it works…There's more…See also
Conventions
Reader feedback
Customer support
Downloading the example codeDownloading the color images of this bookErrataPiracyQuestions
1. Developing Hive
Introduction
Deploying Hive on a Hadoop cluster
Getting readyHow to do it...How it works…
Deploying Hive Metastore
Getting readyHow to do it…
Installing Hive
Getting readyHow to do it…Hive with an embedded metastoreHive with a local metastoreHive with a remote metastore
Configuring HCatalog
Getting readyHow to do it...
Understanding different components of Hive
HiveServerHive metastoreHow to do it...HiveServer2How to do it...Hive clientsHive CLIGetting readyHow to do it...BeelineGetting readyHow to do it...
Compiling Hive from source
Getting readyHow to do it...
Hive packages
Getting readyHow to do it...
Debugging Hive
Getting readyHow to do it...
Running Hive
Getting readyHow to do it...
Changing configurations at runtime
How to do it...
2. Services in Hive
Introducing HiveServer2How to do it…How it works…See also
Understanding HiveServer2 properties
How to do it…How it works…See also
Configuring HiveServer2 high availability
Getting readyHow to do it…How it works…See also
Using HiveServer2 clients
Getting readyHow to do it…BeelineBeeline command optionsJDBCJDBC client sample code using EclipseRunning the JDBC sample code from the command-lineJDBC datatypesOther clients
Introducing the Hive metastore service
How to do it…How it works…
Configuring high availability of metastore service
How to do it…
Introducing Hue
Getting readyHow to do it…Prepare dependenciesDownloading and installing HueConfiguring Hive with HueStarting HueAccessing Hive with Hue
3. Understanding the Hive Data Model
IntroductionIntroducing data typesPrimitive data typesComplex data types
Using numeric data types
How to do it…
Using string data types
How to do it…How it works…
Using Date/Time data types
How to do it…
Using miscellaneous data types
How to do it…
Using complex data types
How to do it…
Using operators
Using relational operatorsHow to do it…Using arithmetic operatorsHow to do it…Using logical operatorsHow to do it…Using complex operatorsHow to do it…
Partitioning
Getting readyHow to do it…
Partitioning a managed table
How to do it…Adding new partitionsRenaming partitionsExchanging partitionsDropping the partitionsLoading data in a managed partitioned table
Partitioning an external table
How to do it…
Bucketing
Getting readyHow to do it…How it works…
4. Hive Data Definition Language
Introduction
Creating a database schema
Getting readyHow to do it…
Dropping a database schema
Getting readyHow to do it…
Altering a database schema
Getting readyHow to do it…
Using a database schema
Getting readyHow to do it…
Showing database schemas
Getting readyHow to do it…
Describing a database schema
Getting readyHow to do it…
Creating tables
How to do it…Create table LIKEHow it works
Dropping tables
Getting readyHow to do it…
Truncating tables
Getting readyHow to do it…
Renaming tables
Getting readyHow to do it…
Altering table properties
Getting readyHow to do it…
Creating views
Getting readyHow to do it…
Dropping views
Getting readyHow to do it…
Altering the view properties
Getting readyHow to do it…
Altering the view as select
Getting readyHow to do it…
Showing tables
Getting readyHow to do it…
Showing partitions
Getting readyHow to do it…
Show the table properties
Getting readyHow to do it…
Showing create table
Getting readyHow to do it…
HCatalog
Getting readyHow to do it…HCatalog DMLs
WebHCat
Getting readyHow to do it…See also…
5. Hive Data Manipulation Language
Introduction
Loading files into tables
Getting readyHow to do it…How it works…
Inserting data into Hive tables from queries
Getting readyHow to do it…How it works…
Inserting data into dynamic partitions
Getting readyHow to do it...How it works…There's more…
Writing data into files from queries
Getting readyHow to do it…
Enabling transactions in Hive
Getting readyHow to do it…
Inserting values into tables from SQL
Getting readyHow to do it…How it works…There's more…
Updating data
Getting readyHow to do it...How it works…There's more…
Deleting data
Getting readyHow to do it...How it works…
6. Hive Extensibility Features
Introduction
Serialization and deserialization formats and data types
How to do it…LazySimpleSerDeRegexSerDeJSONSerDeCSVSerDeThere's more…See also
Exploring views
How to do it…How it works…
Exploring indexes
How to do it…
Hive partitioning
How to do it…Static partitioningDynamic partitioning
Creating buckets in Hive
How to do it…Metastore view of bucketing
Analytics functions in Hive
How to do it…See also
Windowing in Hive
How to do it…LEADLAGFIRST_VALUELAST_VALUESee also
File formats
How to do it…
7. Joins and Join Optimization
Understanding the joins conceptGetting readyHow to do it…How it works…
Using a left/right/full outer join
How to do it…How it works…
Using a left semi join
How to do it…How it works…
Using a cross join
How to do it…How it works…
Using a map-side join
How to do it…How it works…
Using a bucket map join
Getting readyHow to do it…How it works…
Using a bucket sort merge map join
Getting readyHow to do it…How it works…
Using a skew join
How to do it…How it works…
8. Statistics in Hive
Bringing statistics in to HiveHow to do it…
Table and partition statistics in Hive
Getting readyHow to do it…Statistics for a partitioned table
Column statistics in Hive
How to do it…How it works…
Top K statistics in Hive
How to do it…
9. Functions in Hive
Using built-in functionsHow to do it…Mathematical functionsCollection functionsType conversion functionsDate functionsString functionsHow it works…Mathematical functionsCollection functionsType conversion functionsDate functionsString functionsThere's moreConditional functionsMiscellaneous functionsSee also
Using the built-in User-defined Aggregation Function (UDAF)
How to do it…How it works…See more
Using the built-in User Defined Table Function (UDTF)
How to do it…How it works…See also
Creating custom User-Defined Functions (UDF)
How to do it…How it works…
10. Hive Tuning
Enabling predicate pushdown optimizations in HiveGetting readyHow to do it…How it works…
Optimizations to reduce the number of map
Getting readyHow to do it…
Sampling
Getting readySampling bucketed tableBlock samplingLength literalRow countHow to do it…How it works…
11. Hive Security
Securing HadoopHow to do it…How it works…Giving read and write access to user mikeRevoking the access of the user mikeSee also
Authorizing Hive
How to do it…Default authorization–legacy modeStorage-based authorizationSQL standards-based authorizationThere's more
Configuring the SQL standards-based authorization
Getting StartedHow to do it…To list out all existing rolescreating a roleDeleting a roleShowing list of current rolesSetting a roleGranting a roleRevoking a roleChecking roles of a user/roleChecking principles of a roleGranting privilegesRevoking privilegesChecking privileges of a user or roleSee also
Authenticating Hive
How to do it…Anonymous with SASL (default no authentication)Anonymous without SASLKerberosConfiguring the JDBC client for Kerberos authenticationAccess Hive using the Beeline clientAccess Hive using the Hive JDBC client in JavaLDAPPluggable Authentication ModulesCustom
12. Hive Integration with Other Frameworks
Working with Apache SparkGetting readyHow to do it…How it works…
Working with Accumulo
Getting readyHow to do it…How it works…
Working with HBase
Getting readyHow to do it…How it works…
Working with Google Drill
Getting readyHow to do it…How it works…
Index

Content preview from Apache Hive Cookbook

Preface

Hive is an open source big data framework in the Hadoop ecosystem. It provides an SQL-like interface to query data stored in HDFS. Underlying it runs MapReduce programs corresponding to the SQL query. Hive was initially developed by Facebook and later added to the Hadoop ecosystem.

Hive is currently the most preferred framework to query data in Hadoop. Because most of the historical data is stored in RDBMS data stores, including Oracle and Teradata. It is convenient for the developers to run similar SQL statements in Hive to query data.

Along with simple SQL statements, Hive supports wide variety of windowing and analytical functions, including rank, row num, dense rank, lead, and lag.

Hive is considered as de facto big data warehouse solution. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781782161080

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Apache Hive Cookbook

by Hanish Bansal, Saurabh Chauhan, Shrey Mehrotra

Preface

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.