Learning Path: Understanding Tool Integration for Big Data Architecture

Video description

In this Learning Path, you’ll learn how to integrate Hadoop components to implement big data solutions for a variety of use cases, including clickstream analytics, time series problems, transferring data between Hadoop and relational databases, and applications in the finance sector.

Publisher resources

Download Example Code

Table of contents

  1. Introduction to Clickstream Case Study
  2. Requirements
  3. Data Modeling
  4. Data Ingest
  5. Data Processing Engines - Part 1
  6. Data Processing Engines - Part 2
  7. Data Processing Patterns
  8. Orchestration
  9. Putting It All Together
  10. Demo
  11. Q
  12. Introduction
    1. Introduction to Time Series Problems
  13. Kafka
    1. Kafka Architecture and Deployment
    2. Kafka Usage
  14. Spark
    1. Introduction to Spark
    2. Spark Architecture
  15. Spark Streaming
    1. Spark Streaming: Windows Slides
    2. Spark Streaming: Ingestion Sources Using Kafka
    3. Sparks Streaming: Operations on the Stream
  16. Cassandra
    1. Introduction to Cassandra
    2. Cassandra Basic Architecture
    3. Replication, High Availability and Multi Datacenter
    4. Cassandra Weather Website Example
    5. Cassandra Query Language (CQL)
    6. Cassandra Partitions Clustering
    7. Cassandra Read and Write Path
    8. Working with Cassandra
    9. Cassandra Drivers and Access Patterns
  17. Spark and Cassandra
    1. Spark and Cassandra Architecture
    2. Analyzing Cassandra Data Spark SQL
    3. Spark and Cassandra DataStax Enterprise
  18. Real World Use Cases
    1. Real World Use Cases: Streaming Problems
    2. Real World Use Cases: In-place Analytic Problems
  19. Introduction
    1. Course Introduction
    2. About The Author
    3. What Is Big Data
    4. Historical Approaches
    5. Modern-Day Approach
    6. What Is Hadoop
    7. Hadoop Core Vs Ecosystem
    8. Hadoopable Problems
  20. Hadoop Basics
    1. HDFS And Yarn
    2. Hive And Pig Interface Introduction
    3. Introduction To Spark
    4. Hadoop In The Cloud (Amazon Web Services Intro)
    5. Installing Hadoop Into EMR Part - 1
    6. Installing Hadoop Into EMR Part - 2
    7. Installing Cloudera Quickstart VM
    8. Web GUIs
  21. Hadoop Distributed Filesystem (HDFS)
    1. HDFS Architecture
    2. HDFS File Write Walkthrough
    3. Secondary Name Node
    4. Basic HDFS Commands
    5. Using HDFS Commands Part - 1
    6. Using HDFS Commands Part - 2
    7. HA And Federation Basics
    8. HDFS Access Controls (Or Lack Thereof)
  22. Yarn
    1. Yarn Purpose
    2. Yarn Architecture
    3. Yarn With Spark
  23. MapReduce
    1. MapReduce Explained
    2. MapReduce Architecture
    3. MapReduce Code Walkthrough
    4. MapReduce Details Walkthrough
    5. Running MapReduce Job
  24. HDFS Data Import And Export
    1. Import/Export Options
    2. Flume Introduction
    3. Using Flume
    4. Sqoop Introduction
    5. Using Sqoop
    6. HDFS Interaction Tools
    7. Oozie Introduction
  25. Spark Basics
    1. Spark Value Propositions
    2. Spark Run Modes (Yarn, Standalone, Mesos)
    3. RDDs And Dataframes
    4. Hands On Spark Part - 1
    5. Hands On Spark Part - 2
    6. Running Spark Part - 1
    7. Running Spark Part - 2
    8. Optimizing And Debugging Spark
    9. Spark Libraries Overview
  26. Spark Built-In Libraries
    1. Spark SQL
    2. Spark SQL Usage
    3. MLlib Basics
    4. Common MLlib Usage Part - 1
    5. Common MLlib Usage Part - 2
    6. Spark Streaming
    7. GraphX
  27. Hive And Pig
    1. Hive Vs Pig
    2. Hive Basics
    3. Analysis With Hive
    4. Pig Basics
    5. ETL And Analytics With Pig
  28. Hadoop In The Cloud
    1. Hadoop/Cloud Use Cases
    2. Elastic MapReduce (EMR)
  29. Ecosystem
    1. HBase Basics
    2. Enterprise Integration
  30. Wrap Up
    1. Wrap Up
  31. Introduction to Sqoop
    1. Introduction
    2. About The Author
    3. Use Case #1: ELT
    4. Use Case #2: ETL From DWH
    5. Use Case #3: Data Analysis
    6. Use Case #4: Data Archival
    7. Use Case #5: Move Reports To Hadoop
    8. Use Case #6: Data Consolidation
  32. Importing Data To Hadoop From A Relational Database
    1. Command Line Basics: Importing Data Using Sqoop
    2. Importing Data With Column Filters, Row Filters, And Free Text Queries
    3. Parallel Imports
    4. Import Data Directory To HIVE Tables
    5. Incremental Data Import Overview
    6. Incremental Data Import And Using Sqoop Stored Jobs
  33. Sqoop Hands-On: Exporting Data From Hadoop To A Relational Database
    1. Exporting Data Back To A Relational Database Using Sqoop
    2. Exporting data from Hadoop back to RDBMS
  34. Advanced topics
    1. Introduction to Sqoop2 Server
  35. Course summary
    1. Wrap Up
    2. Continuous curation of event data for a customer event hub - Arvind Prabhakar (StreamSets)
    3. Big data governance - Steven Totman (Cloudera), Mark Donsky (Cloudera), Kristi Cunningham (Capital One), Ben Harden (CapTech Consulting)
    4. Preventing a big data security breach - Sam Heywood (Cloudera), Nick Curcuru (MasterCard Advisors), Ritu Kama (Intel)
    5. Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud, a real-world case study - Jaipaul Agonus (FINRA)

Product information

  • Title: Learning Path: Understanding Tool Integration for Big Data Architecture
  • Author(s): O'Reilly Media, Inc.
  • Release date: December 2016
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491978634