book

Mastering Azure Analytics, 1st Edition

Name: Mastering Azure Analytics, 1st Edition
Author: Zoiner Tejada
ISBN: 9781491956656

by Zoiner Tejada

April 2017

Beginner to intermediate

409 pages

10h 24m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Conventions Used in This BookUsing Code ExamplesO’Reilly SafariHow to Contact UsAcknowledgments
1. Enterprise Analytics Fundamentals
The Analytics Data PipelineData LakesLambda ArchitectureKappa ArchitectureChoosing Between Lambda and KappaThe Azure Analytics PipelineIntroducing the Analytics ScenariosExample Code and Example Data SetsWhat You Will NeedBroadband Internet ConnectivityAzure SubscriptionVisual Studio 2015 with Update 1Azure SDK 2.8 or LaterSummary
2. Getting Data into Azure
Ingest Loading LayerBulk Data LoadingDisk ShippingEnd User ToolsNetwork-Oriented ApproachesStream LoadingStream Loading with Event HubsSummary
3. Storing Ingested Data in Azure
File-Oriented StorageBlob StorageAzure Data Lake StoreHDFSQueue-Oriented StorageBlue Yonder Scenario: Smart BuildingsEvent HubsIoT HubSummary
4. Real-Time Processing in Azure
Stream ProcessingConsuming Messages from Event HubsTuple-at-a-Time Processing in AzureIntroducing HDInsightStorm on HDInsightEventProcessorHostAzure Machine LearningSummary
5. Real-Time Micro-Batch Processing in Azure
Micro-Batch Processing in AzureSpark Streaming on HDInsightStorm on HDInsightAzure Stream AnalyticsSummary
6. Batch Processing in Azure
Batch Processing with MapReduce on HDInsightApache Hadoop MapReduceBatch Processing with Hive on HDInsightInternal and External TablesPartitioning TablesViewsIndexesDatabasesUsing Hive on HDInsightStorage on HDInsightBatch Processing Blue Yonder Airports DataCreating an External TableCreating an Internal TableBatch Processing with Pig on HDInsightBatch Processing with Spark on HDInsightBatch Processing Blue Yonder Airports DataCreating an External TableBatch Processing with SQL Data WarehouseUsing SQL Data WarehouseBatch Processing Blue Yonder Airports DataStoring the Credentials to Azure StorageBatch Processing with Data Lake AnalyticsUsing Data Lake AnalyticsBatch Processing Blue Yonder Airports DataProcessing with U-SQLBatch Processing with Azure BatchOrchestrating Batch Processing Pipelines with Azure Data FactorySummary
7. Interactive Querying in Azure
Interactive Querying with Azure SQL Data WarehousePartitions and DistributionsIndexesInteractive Exploration of the Blue Yonder Airports DataInteractive Querying with Hive and TezIndexesPartitionsInteractive Exploration of the Blue Yonder Airports DataInteractive Querying with Spark SQLIndexesPartitionsInteractive Exploration of the Blue Yonder Airports DataInteractive Querying with USQLInteractive Exploration of the Blue Yonder Airports DataSummary
8. Hot and Cold Path Serving Layer in Azure
Azure Redis CacheRedis in the Speed Serving LayerDocument DBDocument DB in the Speed Serving LayerDocument DB in the Batch Serving LayerSQL DatabaseSQL Database in the Speed Serving LayerSQL Database in the Batch Serving LayerSQL Data WarehouseHBase on HDInsightAzure SearchSummary

9. Intelligence and Machine Learning
Azure Machine LearningR Server on HDInsightSQL R ServicesMicrosoft Cognitive ServicesSummary
10. Managing Metadata in Azure
Managing Metadata with Azure Data CatalogData Catalog in the Blue Yonder Airports ScenarioAdd an Azure Data Lake Store AssetAdd Azure Storage BlobsAdd a SQL Data WarehouseSummary
11. Protecting Your Data in Azure
Identity and Access ManagementData ProtectionAuditingSummary
12. Performing Analytics
Analytics with Power BIReal-Time Power BI in the Blue Yonder ScenarioBatch Analytics Reporting with Power BI in the Blue Yonder ScenarioA Look AheadReal TimeLower Batch LatenciesIoTSecurityMore Linux
Index

Content preview from Mastering Azure Analytics, 1st Edition

Chapter 5. Real-Time Micro-Batch Processing in Azure

In the previous chapter, we explored the tuple-at-a-time options in Azure for processing real-time, streaming data. In this chapter we focus on the options that take a micro-batch approach to data processing (see Figure 5-1).

Micro-Batch Processing in Azure

In Azure, there are three approaches that process telemetry streams, such as those coming from an Event Hub or IoT Hub, in small batches. Two of these options (Spark Streaming and Storm) run on managed HDInsight clusters and one of them (Azure Stream Analytics) is purely a managed service with no infrastructure you have to manage at all.

Spark Streaming on HDInsight

Apache Spark provides a fast and general-purpose solution for in-memory and distributed computing, providing APIs that are programmable with the Scala, Java, Python, and R languages. The unique value of Spark is that it provides a set of higher-level frameworks above the main functionality (referred to as Spark Core) for performing structured and SQL-based data processing (Spark SQL), machine learning (MLlib and SparkML), graph processing (GraphX), and stream processing (Spark Streaming). While there are many solutions in the wild that perform each of these functions individually, Spark is unique in how it lets you combine the frameworks to achieve your goals. For example, you can write a single streaming application that uses Spark Streaming as the data processing framework that internally uses SQL queries ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491956649Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design