Apache Hive Cookbook

Book description

Easy, hands-on recipes to help you understand Hive and its integration with frameworks that are used widely in today's big data world

About This Book

  • Grasp a complete reference of different Hive topics.
  • Get to know the latest recipes in development in Hive including CRUD operations
  • Understand Hive internals and integration of Hive with different frameworks used in today's world.

Who This Book Is For

The book is intended for those who want to start in Hive or who have basic understanding of Hive framework. Prior knowledge of basic SQL command is also required

What You Will Learn

  • Learn different features and offering on the latest Hive
  • Understand the working and structure of the Hive internals
  • Get an insight on the latest development in Hive framework
  • Grasp the concepts of Hive Data Model
  • Master the key concepts like Partition, Buckets and Statistics
  • Know how to integrate Hive with other frameworks such as Spark, Accumulo, etc

In Detail

Hive was developed by Facebook and later open sourced in Apache community. Hive provides SQL like interface to run queries on Big Data frameworks. Hive provides SQL like syntax also called as HiveQL that includes all SQL capabilities like analytical functions which are the need of the hour in today's Big Data world.

This book provides you easy installation steps with different types of metastores supported by Hive. This book has simple and easy to learn recipes for configuring Hive clients and services. You would also learn different Hive optimizations including Partitions and Bucketing. The book also covers the source code explanation of latest Hive version.

Hive Query Language is being used by other frameworks including spark. Towards the end you will cover integration of Hive with these frameworks.

Style and approach

Starting with the basics and covering the core concepts with the practical usage, this book is a complete guide to learn and explore Hive offerings.

Table of contents

  1. Apache Hive Cookbook
    1. Table of Contents
    2. Apache Hive Cookbook
    3. Credits
    4. About the Authors
    5. About the Reviewer
    6. www.PacktPub.com
      1. eBooks, discount offers, and more
        1. Why Subscribe?
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Sections
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
        5. See also
      5. Conventions
      6. Reader feedback
      7. Customer support
        1. Downloading the example code
        2. Downloading the color images of this book
        3. Errata
        4. Piracy
        5. Questions
    8. 1. Developing Hive
      1. Introduction
      2. Deploying Hive on a Hadoop cluster
        1. Getting ready
        2. How to do it...
        3. How it works…
      3. Deploying Hive Metastore
        1. Getting ready
        2. How to do it…
      4. Installing Hive
        1. Getting ready
        2. How to do it…
          1. Hive with an embedded metastore
          2. Hive with a local metastore
          3. Hive with a remote metastore
      5. Configuring HCatalog
        1. Getting ready
        2. How to do it...
      6. Understanding different components of Hive
        1. HiveServer
          1. Hive metastore
        2. How to do it...
        3. HiveServer2
        4. How to do it...
        5. Hive clients
          1. Hive CLI
        6. Getting ready
        7. How to do it...
          1. Beeline
        8. Getting ready
        9. How to do it...
      7. Compiling Hive from source
        1. Getting ready
        2. How to do it...
      8. Hive packages
        1. Getting ready
        2. How to do it...
      9. Debugging Hive
        1. Getting ready
        2. How to do it...
      10. Running Hive
        1. Getting ready
        2. How to do it...
      11. Changing configurations at runtime
        1. How to do it...
    9. 2. Services in Hive
      1. Introducing HiveServer2
        1. How to do it…
        2. How it works…
        3. See also
      2. Understanding HiveServer2 properties
        1. How to do it…
        2. How it works…
        3. See also
      3. Configuring HiveServer2 high availability
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. See also
      4. Using HiveServer2 clients
        1. Getting ready
        2. How to do it…
          1. Beeline
            1. Beeline command options
          2. JDBC
            1. JDBC client sample code using Eclipse
            2. Running the JDBC sample code from the command-line
            3. JDBC datatypes
          3. Other clients
      5. Introducing the Hive metastore service
        1. How to do it…
        2. How it works…
      6. Configuring high availability of metastore service
        1. How to do it…
      7. Introducing Hue
        1. Getting ready
        2. How to do it…
          1. Prepare dependencies
          2. Downloading and installing Hue
          3. Configuring Hive with Hue
          4. Starting Hue
          5. Accessing Hive with Hue
    10. 3. Understanding the Hive Data Model
      1. Introduction
        1. Introducing data types
          1. Primitive data types
          2. Complex data types
      2. Using numeric data types
        1. How to do it…
      3. Using string data types
        1. How to do it…
        2. How it works…
      4. Using Date/Time data types
        1. How to do it…
      5. Using miscellaneous data types
        1. How to do it…
      6. Using complex data types
        1. How to do it…
      7. Using operators
        1. Using relational operators
        2. How to do it…
        3. Using arithmetic operators
          1. How to do it…
        4. Using logical operators
        5. How to do it…
        6. Using complex operators
        7. How to do it…
      8. Partitioning
        1. Getting ready
        2. How to do it…
      9. Partitioning a managed table
        1. How to do it…
          1. Adding new partitions
          2. Renaming partitions
          3. Exchanging partitions
          4. Dropping the partitions
          5. Loading data in a managed partitioned table
      10. Partitioning an external table
        1. How to do it…
      11. Bucketing
        1. Getting ready
        2. How to do it…
        3. How it works…
    11. 4. Hive Data Definition Language
      1. Introduction
      2. Creating a database schema
        1. Getting ready
        2. How to do it…
      3. Dropping a database schema
        1. Getting ready
        2. How to do it…
      4. Altering a database schema
        1. Getting ready
        2. How to do it…
      5. Using a database schema
        1. Getting ready
        2. How to do it…
      6. Showing database schemas
        1. Getting ready
        2. How to do it…
      7. Describing a database schema
        1. Getting ready
        2. How to do it…
      8. Creating tables
        1. How to do it…
          1. Create table LIKE
        2. How it works
      9. Dropping tables
        1. Getting ready
        2. How to do it…
      10. Truncating tables
        1. Getting ready
        2. How to do it…
      11. Renaming tables
        1. Getting ready
        2. How to do it…
      12. Altering table properties
        1. Getting ready
        2. How to do it…
      13. Creating views
        1. Getting ready
        2. How to do it…
      14. Dropping views
        1. Getting ready
        2. How to do it…
      15. Altering the view properties
        1. Getting ready
        2. How to do it…
      16. Altering the view as select
        1. Getting ready
        2. How to do it…
      17. Showing tables
        1. Getting ready
        2. How to do it…
      18. Showing partitions
        1. Getting ready
        2. How to do it…
      19. Show the table properties
        1. Getting ready
        2. How to do it…
      20. Showing create table
        1. Getting ready
        2. How to do it…
      21. HCatalog
        1. Getting ready
        2. How to do it…
          1. HCatalog DMLs
      22. WebHCat
        1. Getting ready
        2. How to do it…
        3. See also…
    12. 5. Hive Data Manipulation Language
      1. Introduction
      2. Loading files into tables
        1. Getting ready
        2. How to do it…
        3. How it works…
      3. Inserting data into Hive tables from queries
        1. Getting ready
        2. How to do it…
        3. How it works…
      4. Inserting data into dynamic partitions
        1. Getting ready
        2. How to do it...
        3. How it works…
        4. There's more…
      5. Writing data into files from queries
        1. Getting ready
        2. How to do it…
      6. Enabling transactions in Hive
        1. Getting ready
        2. How to do it…
      7. Inserting values into tables from SQL
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
      8. Updating data
        1. Getting ready
        2. How to do it...
        3. How it works…
        4. There's more…
      9. Deleting data
        1. Getting ready
        2. How to do it...
        3. How it works…
    13. 6. Hive Extensibility Features
      1. Introduction
      2. Serialization and deserialization formats and data types
        1. How to do it…
          1. LazySimpleSerDe
          2. RegexSerDe
          3. JSONSerDe
          4. CSVSerDe
        2. There's more…
        3. See also
      3. Exploring views
        1. How to do it…
        2. How it works…
      4. Exploring indexes
        1. How to do it…
      5. Hive partitioning
        1. How to do it…
          1. Static partitioning
          2. Dynamic partitioning
      6. Creating buckets in Hive
        1. How to do it…
          1. Metastore view of bucketing
      7. Analytics functions in Hive
        1. How to do it…
        2. See also
      8. Windowing in Hive
        1. How to do it…
          1. LEAD
          2. LAG
          3. FIRST_VALUE
            1. LAST_VALUE
        2. See also
      9. File formats
        1. How to do it…
    14. 7. Joins and Join Optimization
      1. Understanding the joins concept
        1. Getting ready
        2. How to do it…
        3. How it works…
      2. Using a left/right/full outer join
        1. How to do it…
        2. How it works…
      3. Using a left semi join
        1. How to do it…
        2. How it works…
      4. Using a cross join
        1. How to do it…
        2. How it works…
      5. Using a map-side join
        1. How to do it…
        2. How it works…
      6. Using a bucket map join
        1. Getting ready
        2. How to do it…
        3. How it works…
      7. Using a bucket sort merge map join
        1. Getting ready
        2. How to do it…
        3. How it works…
      8. Using a skew join
        1. How to do it…
        2. How it works…
    15. 8. Statistics in Hive
      1. Bringing statistics in to Hive
        1. How to do it…
      2. Table and partition statistics in Hive
        1. Getting ready
        2. How to do it…
          1. Statistics for a partitioned table
      3. Column statistics in Hive
        1. How to do it…
        2. How it works…
      4. Top K statistics in Hive
        1. How to do it…
    16. 9. Functions in Hive
      1. Using built-in functions
        1. How to do it…
          1. Mathematical functions
          2. Collection functions
          3. Type conversion functions
          4. Date functions
          5. String functions
        2. How it works…
          1. Mathematical functions
          2. Collection functions
          3. Type conversion functions
          4. Date functions
          5. String functions
        3. There's more
          1. Conditional functions
          2. Miscellaneous functions
        4. See also
      2. Using the built-in User-defined Aggregation Function (UDAF)
        1. How to do it…
        2. How it works…
        3. See more
      3. Using the built-in User Defined Table Function (UDTF)
        1. How to do it…
        2. How it works…
          1. See also
      4. Creating custom User-Defined Functions (UDF)
        1. How to do it…
        2. How it works…
    17. 10. Hive Tuning
      1. Enabling predicate pushdown optimizations in Hive
        1. Getting ready
        2. How to do it…
        3. How it works…
      2. Optimizations to reduce the number of map
        1. Getting ready
        2. How to do it…
      3. Sampling
        1. Getting ready
        2. Sampling bucketed table
        3. Block sampling
        4. Length literal
        5. Row count
        6. How to do it…
        7. How it works…
    18. 11. Hive Security
      1. Securing Hadoop
        1. How to do it…
        2. How it works…
          1. Giving read and write access to user mike
          2. Revoking the access of the user mike
        3. See also
      2. Authorizing Hive
        1. How to do it…
          1. Default authorization–legacy mode
          2. Storage-based authorization
          3. SQL standards-based authorization
        2. There's more
      3. Configuring the SQL standards-based authorization
        1. Getting Started
        2. How to do it…
          1. To list out all existing roles
          2. creating a role
          3. Deleting a role
          4. Showing list of current roles
          5. Setting a role
          6. Granting a role
          7. Revoking a role
          8. Checking roles of a user/role
          9. Checking principles of a role
          10. Granting privileges
          11. Revoking privileges
          12. Checking privileges of a user or role
        3. See also
      4. Authenticating Hive
        1. How to do it…
          1. Anonymous with SASL (default no authentication)
          2. Anonymous without SASL
          3. Kerberos
          4. Configuring the JDBC client for Kerberos authentication
          5. Access Hive using the Beeline client
          6. Access Hive using the Hive JDBC client in Java
          7. LDAP
          8. Pluggable Authentication Modules
          9. Custom
    19. 12. Hive Integration with Other Frameworks
      1. Working with Apache Spark
        1. Getting ready
        2. How to do it…
        3. How it works…
      2. Working with Accumulo
        1. Getting ready
        2. How to do it…
        3. How it works…
      3. Working with HBase
        1. Getting ready
        2. How to do it…
        3. How it works…
      4. Working with Google Drill
        1. Getting ready
        2. How to do it…
        3. How it works…
    20. Index

Product information

  • Title: Apache Hive Cookbook
  • Author(s): Hanish Bansal, Saurabh Chauhan, Shrey Mehrotra
  • Release date: April 2016
  • Publisher(s): Packt Publishing
  • ISBN: 9781782161080