Book description
A comprehensive guide to design, build and execute effective Big Data strategies using Hadoop
About This Book
- Get an in-depth view of the Apache Hadoop ecosystem and an overview of the architectural patterns pertaining to the popular Big Data platform
- Conquer different data processing and analytics challenges using a multitude of tools such as Apache Spark, Elasticsearch, Tableau and more
- A comprehensive, step-by-step guide that will teach you everything you need to know, to be an expert Hadoop Architect
Who This Book Is For
This book is for Big Data professionals who want to fast-track their career in the Hadoop industry and become an expert Big Data architect. Project managers and mainframe professionals looking forward to build a career in Big Data Hadoop will also find this book to be useful. Some understanding of Hadoop is required to get the best out of this book.
What You Will Learn
- Build an efficient enterprise Big Data strategy centered around Apache Hadoop
- Gain a thorough understanding of using Hadoop with various Big Data frameworks such as Apache Spark, Elasticsearch and more
- Set up and deploy your Big Data environment on premises or on the cloud with Apache Ambari
- Design effective streaming data pipelines and build your own enterprise search solutions
- Utilize the historical data to build your analytics solutions and visualize them using popular tools such as Apache Superset
- Plan, set up and administer your Hadoop cluster efficiently
In Detail
The complex structure of data these days requires sophisticated solutions for data transformation, to make the information more accessible to the users.This book empowers you to build such solutions with relative ease with the help of Apache Hadoop, along with a host of other Big Data tools.
This book will give you a complete understanding of the data lifecycle management with Hadoop, followed by modeling of structured and unstructured data in Hadoop. It will also show you how to design real-time streaming pipelines by leveraging tools such as Apache Spark, and build efficient enterprise search solutions using Elasticsearch. You will learn to build enterprise-grade analytics solutions on Hadoop, and how to visualize your data using tools such as Apache Superset. This book also covers techniques for deploying your Big Data solutions on the cloud Apache Ambari, as well as expert techniques for managing and administering your Hadoop cluster.
By the end of this book, you will have all the knowledge you need to build expert Big Data systems.
Style and approach
Comprehensive guide with a perfect blend of theory, examples and implementation of real-world use-cases
Table of contents
- Title Page
- Copyright and Credits
- Packt Upsell
- Contributors
- Preface
- Enterprise Data Architecture Principles
- Hadoop Life Cycle Management
- Hadoop Design Consideration
- Data Movement Techniques
-
Data Modeling in Hadoop
- Apache Hive
- Supported datatypes
- How Hive works
- Hive architecture
- Hive data model management
- JSON documents using Hive
-
Apache HBase
- Differences between HDFS and HBase
- Differences between Hive and HBase
- Key features of HBase
- HBase data model
- Difference between RDBMS table and column - oriented data store
- HBase architecture
- Example 4 – loading data from MySQL table to HBase table
- Example 5 – incrementally loading data from MySQL table to HBase table
- Example 6 – Load the MySQL customer changed data into the HBase table
- Example 7 – Hive HBase integration
- Summary
- Designing Real-Time Streaming Data Pipelines
- Large-Scale Data Processing Frameworks
- Building Enterprise Search Platform
-
Designing Data Visualization Solutions
- Data visualization
-
Practical data visualization in Hadoop
- Apache Druid
- MySQL database
- Apache Superset
- Apache Superset with RDBMS
- Summary
- Developing Applications Using the Cloud
-
Production Hadoop Cluster Deployment
- Apache Ambari architecture
-
Setting up a Hadoop cluster with Ambari
- Server configurations
- Preparing the server 
- Installing the Ambari server 
- Preparing the Hadoop cluster
- Creating the Hadoop cluster 
- Ambari web interface
- The Ambari home page
-
The cluster install wizard
- Naming your cluster
- Selecting the Hadoop version 
- Selecting a server 
- Setting up the node
- Selecting services
- Service placement on nodes
- Selecting slave and client nodes 
- Customizing services
- Reviewing the services
- Installing the services on the nodes
- Installation summary
- The cluster dashboard
- Hadoop clusters
- Summary
Product information
- Title: Modern Big Data Processing with Hadoop
- Author(s):
- Release date: March 2018
- Publisher(s): Packt Publishing
- ISBN: 9781787122765
You might also like
book
Big Data Analytics with Hadoop 3
Explore big data concepts, platforms, analytics, and their applications using the power of Hadoop 3 About …
book
Data Analytics with Hadoop
Ready to use statistical and machine-learning techniques across large data sets? This practical guide shows you …
book
Hadoop with Python
Hadoop is mostly written in Java, but that doesn't exclude the use of other programming languages …
video
Hadoop and Spark Fundamentals
9+ Hours of Video Instruction The perfect (and fast) way to get started with Hadoop and …