book

Mastering Azure Analytics, 1st Edition

Name: Mastering Azure Analytics, 1st Edition
Author: Zoiner Tejada
ISBN: 9781491956656

by Zoiner Tejada

April 2017

Beginner to intermediate

409 pages

10h 24m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Conventions Used in This BookUsing Code ExamplesO’Reilly SafariHow to Contact UsAcknowledgments
1. Enterprise Analytics Fundamentals
The Analytics Data PipelineData LakesLambda ArchitectureKappa ArchitectureChoosing Between Lambda and KappaThe Azure Analytics PipelineIntroducing the Analytics ScenariosExample Code and Example Data SetsWhat You Will NeedBroadband Internet ConnectivityAzure SubscriptionVisual Studio 2015 with Update 1Azure SDK 2.8 or LaterSummary
2. Getting Data into Azure
Ingest Loading LayerBulk Data LoadingDisk ShippingEnd User ToolsNetwork-Oriented ApproachesStream LoadingStream Loading with Event HubsSummary
3. Storing Ingested Data in Azure
File-Oriented StorageBlob StorageAzure Data Lake StoreHDFSQueue-Oriented StorageBlue Yonder Scenario: Smart BuildingsEvent HubsIoT HubSummary
4. Real-Time Processing in Azure
Stream ProcessingConsuming Messages from Event HubsTuple-at-a-Time Processing in AzureIntroducing HDInsightStorm on HDInsightEventProcessorHostAzure Machine LearningSummary
5. Real-Time Micro-Batch Processing in Azure
Micro-Batch Processing in AzureSpark Streaming on HDInsightStorm on HDInsightAzure Stream AnalyticsSummary
6. Batch Processing in Azure
Batch Processing with MapReduce on HDInsightApache Hadoop MapReduceBatch Processing with Hive on HDInsightInternal and External TablesPartitioning TablesViewsIndexesDatabasesUsing Hive on HDInsightStorage on HDInsightBatch Processing Blue Yonder Airports DataCreating an External TableCreating an Internal TableBatch Processing with Pig on HDInsightBatch Processing with Spark on HDInsightBatch Processing Blue Yonder Airports DataCreating an External TableBatch Processing with SQL Data WarehouseUsing SQL Data WarehouseBatch Processing Blue Yonder Airports DataStoring the Credentials to Azure StorageBatch Processing with Data Lake AnalyticsUsing Data Lake AnalyticsBatch Processing Blue Yonder Airports DataProcessing with U-SQLBatch Processing with Azure BatchOrchestrating Batch Processing Pipelines with Azure Data FactorySummary
7. Interactive Querying in Azure
Interactive Querying with Azure SQL Data WarehousePartitions and DistributionsIndexesInteractive Exploration of the Blue Yonder Airports DataInteractive Querying with Hive and TezIndexesPartitionsInteractive Exploration of the Blue Yonder Airports DataInteractive Querying with Spark SQLIndexesPartitionsInteractive Exploration of the Blue Yonder Airports DataInteractive Querying with USQLInteractive Exploration of the Blue Yonder Airports DataSummary
8. Hot and Cold Path Serving Layer in Azure
Azure Redis CacheRedis in the Speed Serving LayerDocument DBDocument DB in the Speed Serving LayerDocument DB in the Batch Serving LayerSQL DatabaseSQL Database in the Speed Serving LayerSQL Database in the Batch Serving LayerSQL Data WarehouseHBase on HDInsightAzure SearchSummary

9. Intelligence and Machine Learning
Azure Machine LearningR Server on HDInsightSQL R ServicesMicrosoft Cognitive ServicesSummary
10. Managing Metadata in Azure
Managing Metadata with Azure Data CatalogData Catalog in the Blue Yonder Airports ScenarioAdd an Azure Data Lake Store AssetAdd Azure Storage BlobsAdd a SQL Data WarehouseSummary
11. Protecting Your Data in Azure
Identity and Access ManagementData ProtectionAuditingSummary
12. Performing Analytics
Analytics with Power BIReal-Time Power BI in the Blue Yonder ScenarioBatch Analytics Reporting with Power BI in the Blue Yonder ScenarioA Look AheadReal TimeLower Batch LatenciesIoTSecurityMore Linux
Index

Content preview from Mastering Azure Analytics, 1st Edition

Chapter 6. Batch Processing in Azure

In this chapter we explore the options for performing batch processing in Azure (Figure 6-1). Just as we did for real-time processing (which aimed for subsecond processing), we will use a latency definition of batch processing. Think of batch processing as those queries or programs that take tens of minutes, hours, or even days to complete.

Batch processing is used in a variety of scenarios, from the initial data munging efforts to a more complete ETL (extract-transform-load) pipeline, to preparing data for ultimate consumption over very large data sets or where the computation takes significant time. In other words, batch processing is a step in your lambda architecture processing pipeline—one that either leads to further interactive exploration (downstream analytics), provides the modeling-ready data for machine learning, or lands the data in a data store optimized for analytics and visualization.

A concrete example of batch processing is transforming a large set of flat, unstructured CSV files into a schematized (and structured format) that is ready for further querying. Along with this, typically the format is converted from the raw formats used for ingest (such as CSV) to binary formats that are more performant for querying because they store data in a columnar format, and often provide indexes and inline statistics about the data contained.

An important concept you will see in action throughout the technologies highlighted in this chapter ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491956649Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Mastering Azure Analytics, 1st Edition

by Zoiner Tejada

Chapter 6. Batch Processing in Azure

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.