Video description
End-to-End Hive: HQL, Partitioning, Bucketing, UDFs, Windowing, Optimization, Map Joins, Indexes
About This Video
- Analytical Processing: Joins, Subqueries, Views, Table Generating Functions, Explode, Lateral View, Windowing and more
- Tuning Hive for better functionality: Partitioning, Bucketing, Join Optimizations, Map Side Joins, Indexes, Writing custom User Defined functions in Java. UDF, UDAF, GenericUDF, GenericUDTF, Custom functions in Python, Implementation of MapReduce for Select, Group by and Join
In Detail
Hive is like a new friend with an old face (SQL). This course is an end-to-end, practical guide to using Hive for Big Data processing. Let's parse that A new friend with an old face: Hive helps you leverage the power of Distributed computing and Hadoop for Analytical processing. Its interface is like an old friend: the very SQL like HiveQL. This course will fill in all the gaps between SQL and what you need to use Hive. End-to-End: The course is an end-to-end guide for using Hive: whether you are analyst who wants to process data or an Engineer who needs to build custom functionality or optimize performance - everything you'll need is right here. New to SQL? No need to look elsewhere. The course has a primer on all the basic SQL constructs, Practical: Everything is taught using real-life examples, working queries and code.
Publisher resources
Table of contents
-
Chapter 1 : You, Us & This Course
- You, Us & This Course 00:02:03
-
Chapter 2 : Introducing Hive
- Hive: An Open-Source Data Warehouse 00:12:59
- Hive and Hadoop 00:09:19
- Hive vs Traditional Relational DBMS 00:13:52
- HiveQL and SQL 00:07:21
-
Chapter 3 : Hadoop and Hive Install
- Hadoop Install Modes 00:08:33
- Hadoop Install Step 1: Standalone Mode 00:15:47
- Hadoop Install Step 2: Pseudo-Distributed Mode 00:11:45
- Hive install 00:12:05
- Code-Along: Getting started 00:06:25
-
Chapter 4 : Hadoop and HDFS Overview
- What is Hadoop? 00:07:25
- HDFS or the Hadoop Distributed File System 00:11:01
-
Chapter 5 : Hive Basics
- Primitive Datatypes 00:17:08
- Collections_Arrays_Maps 00:09:29
- Structs and Unions 00:05:58
- Create Table 00:13:15
- Insert Into Table 00:12:05
- Insert into Table 2 00:06:51
- Alter Table 00:07:22
- HDFS 00:09:25
- HDFS CLI - Interacting with HDFS 00:10:59
- Code-Along: Create Table 00:09:54
- Code-Along: Hive CLI 00:03:07
-
Chapter 6 : Built-in Functions
- Three types of Hive functions 00:06:46
- The Case-When statement, the Size function, the Cast function 00:10:10
- The Explode function 00:13:07
- Code-Along: Hive Built - in functions 00:04:28
-
Chapter 7 : Sub-Queries
- Quirky Sub-Queries 00:07:14
- More on subqueries: Exists and In 00:15:14
- Inserting via subqueries 00:05:23
- Code-Along: Use Subqueries to work with Collection Datatypes 00:05:57
- Views 00:12:18
-
Chapter 8 : Partitioning
- Indices 00:06:41
- Partitioning Introduced 00:06:37
- The Rationale for Partitioning 00:06:16
- How Tables are partitioned 00:09:53
- Using Partitioned Tables 00:05:27
- Dynamic Partitioning: Inserting data into partitioned tables 00:12:44
- Code-Along: Partitioning 00:04:04
-
Chapter 9 : Bucketing
- Introducing Bucketing 00:11:57
- The Advantages of Bucketing 00:04:55
- How Tables are bucketed 00:12:37
- Using Bucketed Tables 00:07:22
- Sampling 00:11:13
-
Chapter 10 : Windowing
- Windowing Introduced 00:12:59
- Windowing - A Simple Example: Cumulative Sum 00:09:39
- Windowing - A More Involved Example: Partitioning 00:11:55
- Windowing - Special Aggregation Functions 00:15:08
- Chapter 11 : Understanding MapReduce
-
Chapter 12 : MapReduce logic for queries: Behind the scenes
- MapReduce Overview: Basic Select-From-Where 00:11:34
- MapReduce Overview: Group-By and Having 00:09:12
- MapReduce Overview: Joins 00:14:17
-
Chapter 13 : Join Optimizations in Hive
- Improving Join performance with tables of different sizes 00:13:12
- The Where clause in Joins 00:04:53
- The Left Semi Join 00:12:11
- Map Side Joins: The Inner Join 00:09:42
- Map Side Joins: The Left, Right and Full Outer Joins 00:11:36
- Map Side Joins: The Bucketed Map Join and the Sorted Merge Join 00:07:52
-
Chapter 14 : Custom Functions in Python
- Custom functions in Python 00:10:40
- Code-Along: Custom Function in Python 00:05:45
-
Chapter 15 : Custom functions in Java
- Introducing UDFs - you're not limited by what Hive offers 00:04:38
- The Simple UDF: The standard function for primitive types 00:07:04
- The Simple UDF: Java implementation for replacetext() 00:08:35
- Generic UDFs, the Object Inspector and DeferredObjects 00:13:51
- The Generic UDF: Java implementation for containsstring() 00:09:11
- The UDAF: Custom aggregate functions can get pretty complex 00:14:09
- The UDAF: Java implementation for max() 00:09:21
- The UDAF: Java implementation for Standard Deviation 00:10:48
- The Generic UDTF: Custom table generating functions 00:07:38
- The Generic UDTF: Java implementation for namesplit() 00:10:21
-
Chapter 16 : SQL Primer - Select Statements
- Select Statements 00:11:47
- Select Statements 2 00:14:12
- Operator Functions 00:06:55
-
Chapter 17 : SQL Primer - Group By, Order by and Having
- Aggregation Operators Introduced 00:18:16
- The Group by Clause 00:17:20
- More Group by Examples 00:19:47
- Order by 00:16:15
- Having 00:19:52
-
Chapter 18 : SQL Primer – Joins
- Introduction to SQL Joins 00:09:54
- Cross Joins and Cartesian Joins 00:17:03
- Inner Joins 00:19:53
- Left Outer Joins 00:15:31
- Right, Full Outer Joins, Natural Joins, Self Joins 00:16:08
- Chapter 19 : Appendix
Product information
- Title: From 0 to 1: Hive for Processing Big Data
- Author(s):
- Release date: December 2017
- Publisher(s): Packt Publishing
- ISBN: 9781788995054
You might also like
video
Apache Spark with Scala
Learn Apache Spark and Scala by 12+ hands-on examples of analyzing big data About This Video …
book
Mastering Hadoop 3
A comprehensive guide to mastering the most advanced Hadoop 3 concepts Key Features Get to grips …
video
Apache Spark with Java - Learn Spark from a Big Data Guru
Learn to analyze large data sets with Apache Spark by 10+ hands-on examples. Take your big …
video
Apache Spark with Python - Big Data with PySpark and Spark
Learn Apache Spark and Python by 12+ hands-on examples of analyzing big data with PySpark and …