Book description
Learn how to build and deploy a modern big data architecture to empower your business
In Detail
Traditional relational databases are today ineffective with dealing with the challenges presented by Big Data. A Hadoop-based architecture offers a radical solution, as it is designed specifically to handle huge sets of unstructured data.
This book takes you through the journey of building a modern data lake architecture using HDInsight, a Hadoop-based service that allows you to successfully manage high volume and velocity data in the Microsoft Azure Cloud. Featuring a wealth of practical examples, you'll find tips and techniques to provision your own HDInsight cluster to ingest, organize, transform, and analyze data.
While guided through HDInsight, you'll explore the wider Hadoop ecosystem with plenty of working examples on Hadoop technologies including Hive, Pig, MapReduce, HBase, Storm, and analytics solutions including using Excel PowerQuery, PowerMap, and PowerBI.
What You Will Learn
- Explore core features of Hadoop, including the HDFS2 and YARN, the new resource manager for Hadoop
- Build your HDInsight cluster in minutes and learn how to administer it using Azure PowerShell
- Discover what's new in Hadoop 2.X and the reference architecture for a modern data lake based on Hadoop
- Find out more about a data lake vision and its core capabilities
- Ingest and organize your data into HDInsight
- Utilize open source software to transform data including Hive, Pig, and MapReduce, and make it available for decision makers
- Get to grips with architectural considerations for scalability, maintainability, and security
Table of contents
-
HDInsight Essentials Second Edition
- Table of Contents
- HDInsight Essentials Second Edition
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Preface
- 1. Hadoop and HDInsight in a Heartbeat
- 2. Enterprise Data Lake using HDInsight
- 3. HDInsight Service on Azure
- 4. Administering Your HDInsight Cluster
-
5. Ingest and Organize Data Lake
- End-to-end Data Lake solution
- Ingesting to Data Lake using HDFS command
- Loading data to Azure Blob storage using Azure PowerShell
- Loading files to Data Lake using GUI tools
- Using Sqoop to move data from RDBMS to Data Lake
- Organizing your Data Lake in HDFS
- Managing file metadata using HCatalog
- Summary
- 6. Transform Data in the Data Lake
- 7. Analyze and Report from Data Lake
- 8. HDInsight 3.1 New Features
- 9. Strategy for a Successful Data Lake Implementation
- Index
Product information
- Title: HDInsight Essentials - Second Edition
- Author(s):
- Release date: January 2015
- Publisher(s): Packt Publishing
- ISBN: 9781784399429
You might also like
article
Reinventing the Organization for GenAI and LLMs
Previous technology breakthroughs did not upend organizational structure, but generative AI and LLMs will. We now …
book
Expert Hadoop® Administration
The Comprehensive, Up-to-Date Apache Hadoop Administration Handbook and Reference “Sam Alapati has worked with production Hadoop …
book
Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data
The new challenge of integrated solutions is to get more knowledge from data in order to …
book
IBM Information Server: Integration and Governance for Emerging Data Warehouse Demands
This IBM® Redbooks® publication is intended for business leaders and IT architects who are responsible for …