Microsoft Azure Data Lake Storage Service (Gen1 and Gen2)

Video description

Azure Data Lake Storage Gen2 (ADLS) is a cloud-based repository for both structured and unstructured data. For example, you could use it to store everything, from documents to images to social media streams. This is one of the most effective ways to go for big data processing; that is, to store your data in ADLS and then process it using Spark, which is a faster version of Hadoop, on Azure Databricks.

This is a comprehensive hands-on course for anyone who is interested in Azure’s big data analytics services. You will learn hands-on with examples to import data into ADLS and then securely access it and analyze it using Azure Databricks and Azure HDInsight. You will also learn how to monitor and optimize your Data Lake storage. This course provides an end-to-end demonstration for one to have a noticeably clear understanding of Data Lake.

By the end of this course, you will learn how to ingest, process, and export data using Databricks and HDInsight. You will have a solid understanding of Microsoft Azure Data Lake Storage Service (Gen1 and Gen2) and its features and properties, which will help you further in your professional endeavors.

What You Will Learn

  • Explore Data Lake optimization strategy
  • Learn to monitor the performance of your Data Lake
  • Explore different tools and scenarios to ingest data into Data Lake
  • Discover the five layers of security to protect Data Lake
  • Explore data security and configure them using the Azure portal
  • Learn to monitor Azure Storage Service through Metrics

Audience

This course is for anyone interested in Azure’s big data analytics services. Also, Microsoft Azure data engineers, database and BI developers, database administrators, data analysts, or similar profiles can opt for this course.

Just a basic understanding of data warehouse and database, in general, will help you understand this course better.

About The Author

Eshant Garg: Eshant Garg has 16 years of extensive professional experience with expertise in database and business intelligence solutions, advanced analytics, design and solution architecture, reporting, and cloud computing technologies (Azure and AWS). He loves to explain complicated things in a simple and effective way. As a developer and architect, he has worked closely with customers, users, and colleagues to support business solutions across a variety of industries including healthcare, insurance, finance, and government ranging from small companies to Fortune 500 companies.

Outside of the technical world, he loves yoga and meditation. He is a student of the ancient yogic text, the Bhagavad Gita, and loves to discuss and practice philosophical teachings.

Table of contents

  1. Chapter 1 : Course Introduction
    1. Course Introduction
  2. Chapter 2 : Introduction to Azure Cloud Computing
    1. Create Azure Free Subscription
    2. Azure Portal Overview
    3. Azure Services Overview
    4. Resource Management Group and Subscription
    5. Resource Groups
    6. Tagging
    7. Delete Resources and Set Budget
  3. Chapter 3 : Introduction to Azure Data Lake
    1. Problem Statement
    2. What is Data Lake?
    3. Data Lake Versus Hadoop
    4. How Data Lake Gen2 Evolved
    5. Azure Data Lake Versus Azure Blob storage
    6. Provision Azure Data Lake Gen2 Account
    7. Azure Data Lake Gen2 Account Overview
    8. Hierarchical Namespace
    9. Other Data Lake Gen 2 Features
  4. Chapter 4 : Data Ingestion
    1. Tools to Ingest Data in Data Lake
    2. Demo - Ingest Using Portal and SE
    3. Demo- Ingest Data Using Azcopy
    4. Demo: Azure Blob Storage to Data Lake Gen2 Using Data Factory
    5. Demo: SQL Server to Data Lake Gen2 Using Data Factory
    6. Demo: Amazon S3 to Data Lake Gen2 Using Data Factory
  5. Chapter 5 : Data Flow Around Data Lake
    1. Data Flow Around Data Lake
    2. Data Lake and Transient Clusters
  6. Chapter 6 : Azure Data Lake Processing Through Databricks
    1. Demo Overview
    2. Demo: Provision Databricks, Clusters, and Workbook
    3. Demo: Mount Data Lake to Databricks DBFS
    4. Demo: Explore, Analyze, Clean, Transform, and Load Data
  7. Chapter 7 : Azure Data Lake Processing Through HDInsight
    1. Demo Overview
    2. Create Azure Data Lake Storage Gen2 (Source) and SQL Server (Destination)
    3. What is Managed Identity
    4. Add Managed Identity to Gen2 and Database Accounts
    5. Create HDInsight Interactive Query Cluster
    6. Ambari Overview and UI
    7. Ingest Dataset into Data Lake Storage
    8. Data Extraction with Hive
    9. Data Transformation with Hive
    10. Data Export Using Sqoop
    11. Summary
  8. Chapter 8 : Security Layers in Data Lake
    1. Introduction
    2. Storage Access Keys
    3. SAS - Shared Access Signature
    4. Azure Active Directory
    5. Access Control List (ACL)
    6. Firewalls and Virtual Networks
    7. Encryption in Transit
    8. Encryption at Rest
    9. Advanced Threat Protection
  9. Chapter 9 : Data Lake Monitoring and Optimization
    1. Activity Log
    2. Demo - Activity Logs
    3. Metrics
    4. Demo: Metrics
    5. Demo: Insights
    6. Demo: Alerts
    7. Diagnostic Settings
    8. Demo: Diagnostic Settings
    9. Optimization
  10. Chapter 10 : Practice Tests and Bonus
    1. Delete Resources

Product information

  • Title: Microsoft Azure Data Lake Storage Service (Gen1 and Gen2)
  • Author(s): Eshant Garg
  • Release date: February 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781803236407