book

Mastering Azure Analytics, 1st Edition

Name: Mastering Azure Analytics, 1st Edition
Author: Zoiner Tejada
ISBN: 9781491956656

by Zoiner Tejada

April 2017

Beginner to intermediate

409 pages

10h 24m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Conventions Used in This BookUsing Code ExamplesO’Reilly SafariHow to Contact UsAcknowledgments
1. Enterprise Analytics Fundamentals
The Analytics Data PipelineData LakesLambda ArchitectureKappa ArchitectureChoosing Between Lambda and KappaThe Azure Analytics PipelineIntroducing the Analytics ScenariosExample Code and Example Data SetsWhat You Will NeedBroadband Internet ConnectivityAzure SubscriptionVisual Studio 2015 with Update 1Azure SDK 2.8 or LaterSummary
2. Getting Data into Azure
Ingest Loading LayerBulk Data LoadingDisk ShippingEnd User ToolsNetwork-Oriented ApproachesStream LoadingStream Loading with Event HubsSummary
3. Storing Ingested Data in Azure
File-Oriented StorageBlob StorageAzure Data Lake StoreHDFSQueue-Oriented StorageBlue Yonder Scenario: Smart BuildingsEvent HubsIoT HubSummary
4. Real-Time Processing in Azure
Stream ProcessingConsuming Messages from Event HubsTuple-at-a-Time Processing in AzureIntroducing HDInsightStorm on HDInsightEventProcessorHostAzure Machine LearningSummary
5. Real-Time Micro-Batch Processing in Azure
Micro-Batch Processing in AzureSpark Streaming on HDInsightStorm on HDInsightAzure Stream AnalyticsSummary
6. Batch Processing in Azure
Batch Processing with MapReduce on HDInsightApache Hadoop MapReduceBatch Processing with Hive on HDInsightInternal and External TablesPartitioning TablesViewsIndexesDatabasesUsing Hive on HDInsightStorage on HDInsightBatch Processing Blue Yonder Airports DataCreating an External TableCreating an Internal TableBatch Processing with Pig on HDInsightBatch Processing with Spark on HDInsightBatch Processing Blue Yonder Airports DataCreating an External TableBatch Processing with SQL Data WarehouseUsing SQL Data WarehouseBatch Processing Blue Yonder Airports DataStoring the Credentials to Azure StorageBatch Processing with Data Lake AnalyticsUsing Data Lake AnalyticsBatch Processing Blue Yonder Airports DataProcessing with U-SQLBatch Processing with Azure BatchOrchestrating Batch Processing Pipelines with Azure Data FactorySummary
7. Interactive Querying in Azure
Interactive Querying with Azure SQL Data WarehousePartitions and DistributionsIndexesInteractive Exploration of the Blue Yonder Airports DataInteractive Querying with Hive and TezIndexesPartitionsInteractive Exploration of the Blue Yonder Airports DataInteractive Querying with Spark SQLIndexesPartitionsInteractive Exploration of the Blue Yonder Airports DataInteractive Querying with USQLInteractive Exploration of the Blue Yonder Airports DataSummary
8. Hot and Cold Path Serving Layer in Azure
Azure Redis CacheRedis in the Speed Serving LayerDocument DBDocument DB in the Speed Serving LayerDocument DB in the Batch Serving LayerSQL DatabaseSQL Database in the Speed Serving LayerSQL Database in the Batch Serving LayerSQL Data WarehouseHBase on HDInsightAzure SearchSummary

9. Intelligence and Machine Learning
Azure Machine LearningR Server on HDInsightSQL R ServicesMicrosoft Cognitive ServicesSummary
10. Managing Metadata in Azure
Managing Metadata with Azure Data CatalogData Catalog in the Blue Yonder Airports ScenarioAdd an Azure Data Lake Store AssetAdd Azure Storage BlobsAdd a SQL Data WarehouseSummary
11. Protecting Your Data in Azure
Identity and Access ManagementData ProtectionAuditingSummary
12. Performing Analytics
Analytics with Power BIReal-Time Power BI in the Blue Yonder ScenarioBatch Analytics Reporting with Power BI in the Blue Yonder ScenarioA Look AheadReal TimeLower Batch LatenciesIoTSecurityMore Linux
Index

Content preview from Mastering Azure Analytics, 1st Edition

Chapter 3. Storing Ingested Data in Azure

In this chapter, we explore where to land the transferred data and how to choose among the storage options. These options fall into two broad categories: file-oriented storage and queue-oriented storage. The particular category selected impacts the type (and latency) of processing performed at later stages in the pipeline. We intentionally omit other data stores (such as NoSQL or document stores) as the initial landing place for ingested data, as the file and queue options are the simplest and least likely to impose changes on the ingested data before processing can begin.

In terms of our analytics pipeline we are going to examine the storage items outlined in Figure 3-1.

File-Oriented Storage

The more things change, the more they stay the same. This is also true of the innovations in approaches for storing big data used in analytics scenarios—the notion of a filesystem that contains a tree of directories, which in turn can contain files of different formats and encodings, has persisted in storing data at cloud scale. In this section, we examine three such “filesystems” prevalent in Azure: Blob Storage, Azure Data Lake Store, and the Hadoop File System (HDFS).

Blob Storage

Azure Blob Storage provides highly available, high-scale object storage and allows ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491956649Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business