book

Mastering Azure Analytics, 1st Edition

Name: Mastering Azure Analytics, 1st Edition
Author: Zoiner Tejada
ISBN: 9781491956656

by Zoiner Tejada

April 2017

Beginner to intermediate

409 pages

10h 24m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Conventions Used in This BookUsing Code ExamplesO’Reilly SafariHow to Contact UsAcknowledgments
1. Enterprise Analytics Fundamentals
The Analytics Data PipelineData LakesLambda ArchitectureKappa ArchitectureChoosing Between Lambda and KappaThe Azure Analytics PipelineIntroducing the Analytics ScenariosExample Code and Example Data SetsWhat You Will NeedBroadband Internet ConnectivityAzure SubscriptionVisual Studio 2015 with Update 1Azure SDK 2.8 or LaterSummary
2. Getting Data into Azure
Ingest Loading LayerBulk Data LoadingDisk ShippingEnd User ToolsNetwork-Oriented ApproachesStream LoadingStream Loading with Event HubsSummary
3. Storing Ingested Data in Azure
File-Oriented StorageBlob StorageAzure Data Lake StoreHDFSQueue-Oriented StorageBlue Yonder Scenario: Smart BuildingsEvent HubsIoT HubSummary
4. Real-Time Processing in Azure
Stream ProcessingConsuming Messages from Event HubsTuple-at-a-Time Processing in AzureIntroducing HDInsightStorm on HDInsightEventProcessorHostAzure Machine LearningSummary
5. Real-Time Micro-Batch Processing in Azure
Micro-Batch Processing in AzureSpark Streaming on HDInsightStorm on HDInsightAzure Stream AnalyticsSummary
6. Batch Processing in Azure
Batch Processing with MapReduce on HDInsightApache Hadoop MapReduceBatch Processing with Hive on HDInsightInternal and External TablesPartitioning TablesViewsIndexesDatabasesUsing Hive on HDInsightStorage on HDInsightBatch Processing Blue Yonder Airports DataCreating an External TableCreating an Internal TableBatch Processing with Pig on HDInsightBatch Processing with Spark on HDInsightBatch Processing Blue Yonder Airports DataCreating an External TableBatch Processing with SQL Data WarehouseUsing SQL Data WarehouseBatch Processing Blue Yonder Airports DataStoring the Credentials to Azure StorageBatch Processing with Data Lake AnalyticsUsing Data Lake AnalyticsBatch Processing Blue Yonder Airports DataProcessing with U-SQLBatch Processing with Azure BatchOrchestrating Batch Processing Pipelines with Azure Data FactorySummary
7. Interactive Querying in Azure
Interactive Querying with Azure SQL Data WarehousePartitions and DistributionsIndexesInteractive Exploration of the Blue Yonder Airports DataInteractive Querying with Hive and TezIndexesPartitionsInteractive Exploration of the Blue Yonder Airports DataInteractive Querying with Spark SQLIndexesPartitionsInteractive Exploration of the Blue Yonder Airports DataInteractive Querying with USQLInteractive Exploration of the Blue Yonder Airports DataSummary
8. Hot and Cold Path Serving Layer in Azure
Azure Redis CacheRedis in the Speed Serving LayerDocument DBDocument DB in the Speed Serving LayerDocument DB in the Batch Serving LayerSQL DatabaseSQL Database in the Speed Serving LayerSQL Database in the Batch Serving LayerSQL Data WarehouseHBase on HDInsightAzure SearchSummary

9. Intelligence and Machine Learning
Azure Machine LearningR Server on HDInsightSQL R ServicesMicrosoft Cognitive ServicesSummary
10. Managing Metadata in Azure
Managing Metadata with Azure Data CatalogData Catalog in the Blue Yonder Airports ScenarioAdd an Azure Data Lake Store AssetAdd Azure Storage BlobsAdd a SQL Data WarehouseSummary
11. Protecting Your Data in Azure
Identity and Access ManagementData ProtectionAuditingSummary
12. Performing Analytics
Analytics with Power BIReal-Time Power BI in the Blue Yonder ScenarioBatch Analytics Reporting with Power BI in the Blue Yonder ScenarioA Look AheadReal TimeLower Batch LatenciesIoTSecurityMore Linux
Index

Content preview from Mastering Azure Analytics, 1st Edition

Chapter 1. Enterprise Analytics Fundamentals

In this chapter we’ll review the fundamentals of enterprise analytic architectures. We will introduce the analytics data pipeline, a fundamental process that takes data from its source through several steps until it is available to analytics clients. Then we will introduce the concept of a data lake, as well as two different pipeline architectures: lambda architecture and kappa architecture. The particular steps in the typical data processing pipeline (as well as considerations around the handling of “hot” and “cold” data) are detailed and serve as a framework for the rest of the book. We conclude the chapter by introducing our case study scenarios, along with their respective data sets, which provide a more real-world context for performing big data analytics on Azure.

The Analytics Data Pipeline

Data does not end up nicely formatted for analytics on its own; it takes a series of steps that involve collecting the data from the source, massaging the data to get it into the forms appropriate to the analytics desired (sometimes referred to as data wrangling or data munging), and ultimately pushing the prepared results to the location from which they can be consumed. This series of steps can be thought of as a pipeline.

The analytics data pipeline forms a basis for understanding any analytics solution, and thus is very useful to our purposes in this book as we seek to understand how to accomplish analytics using Microsoft Azure. As shown ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491956649Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Mastering Azure Analytics, 1st Edition

by Zoiner Tejada

Chapter 1. Enterprise Analytics Fundamentals

The Analytics Data Pipeline

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.