Book description
This is the Rough Cut version of the printed book.
With The world of data is changing rapidly. The growing demands of end users (Consumerization of IT) and availability of new types of data (Data explosion - 85% of this new data is coming from new data types e.g. sensors, RFIDs, WebLogs, high-definition video streaming, oil and gas exploration etc.) is causing a widening gap between our ability to store vast amounts of data and our ability to get meaningful insight and drive decision making based on this vast amount of data. This data explosion, combined with the fact that the cost of storage has practically gone to zero has landed us in a world where we need to have the ability to store all this data and get insight into it. This makes sense for companies to make better business decisions by enabling data scientists and other users to analyze huge volumes of transaction data as well as other data sources that may be left untapped by traditional business intelligence (BI) programs.
On the analytics front there is a shift from traditional BI to predictive analytics as well - traditional BI helps customers to understand what has happened in past (rear view mirror) whereas predictive analysis allows customer to understand what would happen in future (forward-looking view). Predictive analysis has been effective in areas such as fraud detection, sales targeting, customer churn analysis, Ad Placement to increase revenue etc.
This book is going to cover in detail about storing vast amount of data (big data) on hadoop on windows (in Windows Azure platform) and getting insight into it with familiar Microsoft BI tools.
It addresses questions such as, "What is Big Data and how can Hadoop be used by an organization to tap into it? What are some of the important tools and technologies around the Hadoop ecosystem and Microsoft's partnership with Hortonworks?"
From this book you will learn:
• Ease of installation, configuration and monitoring of
Hadoop (HDInsight) cluster on cloud platform
• Distributed storage and processing of unstructured data or
big data
• Programming to do big data analytics with MapReduce, Hive,
PIG
• Integration of Hadoop with Microsoft BI (MSBI) tools
• Analyze and create visualization reports your with
Microsoft Power BI
Table of contents
- About This E-Book
- Title Page
- Copyright Page
- Contents at a Glance
- Table of Contents
- About the Authors
- Dedications
- Acknowledgments
- We Want to Hear from You!
- Reader Services
- Introduction
-
Part I: Understanding Big Data, Hadoop 1.0, and 2.0
- Hour 1. Introduction of Big Data, NoSQL, and Business Value Proposition
- Hour 2. Introduction to Hadoop, Its Architecture, Ecosystem, and Microsoft Offerings
- Hour 3. Hadoop Distributed File System Versions 1.0 and 2.0
- Hour 4. The MapReduce Job Framework and Job Execution Pipeline
- Hour 5. MapReduce—Advanced Concepts and YARN
-
Part II: Getting Started with HDInsight and Understanding Its Different Components
- Hour 6. Getting Started with HDInsight, Provisioning Your HDInsight Service Cluster, and Automating HDInsight Cluster Provisioning
- Hour 7. Exploring Typical Components of HDFS Cluster
- Hour 8. Storing Data in Microsoft Azure Storage Blob
- Hour 9. Working with Microsoft Azure HDInsight Emulator
-
Part III: Programming MapReduce and HDInsight Script Action
- Hour 10. Programming MapReduce Jobs
- Hour 11. Customizing the HDInsight Cluster with Script Action
-
Part IV: Querying and Processing Big Data in HDInsight
- Hour 12. Getting Started with Apache Hive and Apache Tez in HDInsight
- Hour 13. Programming with Apache Hive, Apache Tez in HDInsight, and Apache HCatalog
- Hour 14. Consuming HDInsight Data from Microsoft BI Tools over Hive ODBC Driver: Part 1
- Hour 15. Consuming HDInsight Data from Microsoft BI Tools over Hive ODBC Driver: Part 2
- Hour 16. Integrating HDInsight with SQL Server Integration Services
- Hour 17. Using Pig for Data Processing
- Hour 18. Using Sqoop for Data Movement Between RDBMS and HDInsight
-
Part V: Managing Workflow and Performing Statistical Computing
- Hour 19. Using Oozie Workflows and Job Orchestration with HDInsight
- Hour 20. Performing Statistical Computing with R
-
Part VI: Performing Interactive Analytics and Machine Learning
- Hour 21. Performing Big Data Analytics with Spark
- Hour 22. Microsoft Azure Machine Learning
-
Part VII: Performing Real-time Analytics
- Hour 23. Performing Stream Analytics with Storm
- Hour 24. Introduction to Apache HBase on HDInsight
-
Part VIII: Bonus Chapters
- Hour 25. Getting Started with Apache HBase on HDInsight
- Hour 26. Integration of Enterprise Data Warehouse with Hadoop and the Microsoft Analytics Platform System
- Index
- Code Snippets
Product information
- Title: Sams Teach Yourself: Big Data Analytics with Microsoft HDInsight in 24 Hours, Big Data, Hadoop, and Microsoft Azure for Better Business Intelligence
- Author(s):
- Release date: October 2015
- Publisher(s): Sams
- ISBN: 9780134035314
You might also like
video
Microsoft AZ-900 Certification Course: Azure Fundamentals
Not sure where to start with the Microsoft Azure platform? Whether an IT pro or new …
book
AWS Certified Solutions Architect Official Study Guide
Validate your AWS skills. This is your opportunity to take the next step in your career …
book
Exam Ref AZ-900: Microsoft Azure Fundamentals, First Edition
Prepare for Microsoft Exam AZ-900–and help demonstrate your real-world mastery of cloud services and how they …
book
Exam Ref AZ-900 Microsoft Azure Fundamentals, 2nd Edition
Prepare for Microsoft Exam AZ-900demonstrate your real-world knowledge of cloud services and how they can be …