From 0 to 1: Hive for Processing Big Data

Video description

End-to-End Hive: HQL, Partitioning, Bucketing, UDFs, Windowing, Optimization, Map Joins, Indexes

About This Video

  • Analytical Processing: Joins, Subqueries, Views, Table Generating Functions, Explode, Lateral View, Windowing and more
  • Tuning Hive for better functionality: Partitioning, Bucketing, Join Optimizations, Map Side Joins, Indexes, Writing custom User Defined functions in Java. UDF, UDAF, GenericUDF, GenericUDTF, Custom functions in Python, Implementation of MapReduce for Select, Group by and Join

In Detail

Hive is like a new friend with an old face (SQL). This course is an end-to-end, practical guide to using Hive for Big Data processing. Let's parse that A new friend with an old face: Hive helps you leverage the power of Distributed computing and Hadoop for Analytical processing. Its interface is like an old friend: the very SQL like HiveQL. This course will fill in all the gaps between SQL and what you need to use Hive. End-to-End: The course is an end-to-end guide for using Hive: whether you are analyst who wants to process data or an Engineer who needs to build custom functionality or optimize performance - everything you'll need is right here. New to SQL? No need to look elsewhere. The course has a primer on all the basic SQL constructs, Practical: Everything is taught using real-life examples, working queries and code.

Table of contents

  1. Chapter 1 : You, Us This Course
    1. You, Us This Course
  2. Chapter 2 : Introducing Hive
    1. Hive: An Open-Source Data Warehouse
    2. Hive and Hadoop
    3. Hive vs Traditional Relational DBMS
    4. HiveQL and SQL
  3. Chapter 3 : Hadoop and Hive Install
    1. Hadoop Install Modes
    2. Hadoop Install Step 1: Standalone Mode
    3. Hadoop Install Step 2: Pseudo-Distributed Mode
    4. Hive install
    5. Code-Along: Getting started
  4. Chapter 4 : Hadoop and HDFS Overview
    1. What is Hadoop?
    2. HDFS or the Hadoop Distributed File System
  5. Chapter 5 : Hive Basics
    1. Primitive Datatypes
    2. Collections_Arrays_Maps
    3. Structs and Unions
    4. Create Table
    5. Insert Into Table
    6. Insert into Table 2
    7. Alter Table
    8. HDFS
    9. HDFS CLI - Interacting with HDFS
    10. Code-Along: Create Table
    11. Code-Along: Hive CLI
  6. Chapter 6 : Built-in Functions
    1. Three types of Hive functions
    2. The Case-When statement, the Size function, the Cast function
    3. The Explode function
    4. Code-Along: Hive Built - in functions
  7. Chapter 7 : Sub-Queries
    1. Quirky Sub-Queries
    2. More on subqueries: Exists and In
    3. Inserting via subqueries
    4. Code-Along: Use Subqueries to work with Collection Datatypes
    5. Views
  8. Chapter 8 : Partitioning
    1. Indices
    2. Partitioning Introduced
    3. The Rationale for Partitioning
    4. How Tables are partitioned
    5. Using Partitioned Tables
    6. Dynamic Partitioning: Inserting data into partitioned tables
    7. Code-Along: Partitioning
  9. Chapter 9 : Bucketing
    1. Introducing Bucketing
    2. The Advantages of Bucketing
    3. How Tables are bucketed
    4. Using Bucketed Tables
    5. Sampling
  10. Chapter 10 : Windowing
    1. Windowing Introduced
    2. Windowing - A Simple Example: Cumulative Sum
    3. Windowing - A More Involved Example: Partitioning
    4. Windowing - Special Aggregation Functions
  11. Chapter 11 : Understanding MapReduce
    1. The basic philosophy underlying MapReduce
    2. MapReduce - Visualized and Explained
    3. MapReduce - Digging a little deeper at every step
  12. Chapter 12 : MapReduce logic for queries: Behind the scenes
    1. MapReduce Overview: Basic Select-From-Where
    2. MapReduce Overview: Group-By and Having
    3. MapReduce Overview: Joins
  13. Chapter 13 : Join Optimizations in Hive
    1. Improving Join performance with tables of different sizes
    2. The Where clause in Joins
    3. The Left Semi Join
    4. Map Side Joins: The Inner Join
    5. Map Side Joins: The Left, Right and Full Outer Joins
    6. Map Side Joins: The Bucketed Map Join and the Sorted Merge Join
  14. Chapter 14 : Custom Functions in Python
    1. Custom functions in Python
    2. Code-Along: Custom Function in Python
  15. Chapter 15 : Custom functions in Java
    1. Introducing UDFs - you're not limited by what Hive offers
    2. The Simple UDF: The standard function for primitive types
    3. The Simple UDF: Java implementation for replacetext()
    4. Generic UDFs, the Object Inspector and DeferredObjects
    5. The Generic UDF: Java implementation for containsstring()
    6. The UDAF: Custom aggregate functions can get pretty complex
    7. The UDAF: Java implementation for max()
    8. The UDAF: Java implementation for Standard Deviation
    9. The Generic UDTF: Custom table generating functions
    10. The Generic UDTF: Java implementation for namesplit()
  16. Chapter 16 : SQL Primer - Select Statements
    1. Select Statements
    2. Select Statements 2
    3. Operator Functions
  17. Chapter 17 : SQL Primer - Group By, Order by and Having
    1. Aggregation Operators Introduced
    2. The Group by Clause
    3. More Group by Examples
    4. Order by
    5. Having
  18. Chapter 18 : SQL Primer – Joins
    1. Introduction to SQL Joins
    2. Cross Joins and Cartesian Joins
    3. Inner Joins
    4. Left Outer Joins
    5. Right, Full Outer Joins, Natural Joins, Self Joins
  19. Chapter 19 : Appendix
    1. [For Linux/Mac OS Shell Newbies] Path and other Environment Variables
    2. Setting up a Virtual Linux Instance - For Windows Users

Product information

  • Title: From 0 to 1: Hive for Processing Big Data
  • Author(s): Loonycorn
  • Release date: December 2017
  • Publisher(s): Packt Publishing
  • ISBN: 9781788995054