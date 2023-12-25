Amazon Redshift: The Definitive Guide

Amazon Redshift: The Definitive Guide

by Rajesh Francis, Rajiv Gupta, Milind Oke
Released December 2023
Publisher(s): O'Reilly Media, Inc.
ISBN: 9781098135287

Book description

Amazon Redshift powers analytic cloud data warehouses worldwide, from startups to some of the largest enterprise data warehouses available today. This practical guide thoroughly examines this managed service and demonstrates how you can use it to extract value from your data immediately, rather than go through the heavy lifting required to run a typical data warehouse.

Analytic specialists Rajesh Francis, Rajiv Gupta, and Milind Oke detail Amazon Redshift's underlying mechanisms and options to help you explore out-of-the box automation. Whether you're a data engineer who wants to learn the art of the possible or a DBA looking to take advantage of machine learning-based auto-tuning, this book helps you get the most value from Amazon Redshift.

By understanding Amazon Redshift features, you'll achieve excellent analytic performance at the best price, with the least effort. This book helps you:

  • Build a cloud data strategy around Amazon Redshift as foundational data warehouse
  • Get started with Amazon Redshift with simple-to-use data models and design best practices
  • Understand how and when to use Redshift Serverless and Redshift provisioned clusters
  • Take advantage of auto-tuning options inherent in Amazon Redshift and understand manual tuning options
  • Transform your data platform for predictive analytics using Redshift ML and break silos using data sharing
  • Learn best practices for security, monitoring, resilience, and disaster recovery
  • Leverage Amazon Redshift integration with other AWS services to unlock additional value

Table of contents

  1. 1. AWS for Data
    1. Data Driven Organizations
      1. Business Use Cases
    2. Modern Data strategy
      1. Comprehensive set of capabilities
      2. Integrated set of tools
      3. End-to-end data governance
    3. Modern data architecture
      1. Role of Amazon Redshift in a modern data architecture
      2. Real world benefits of adopting a modern data architecture
      3. Reference architecture for Modern data architecture
      4. Data sourcing
      5. Extract, Transform and Load (ETL)
      6. Storage
      7. Analysis
    4. Data Mesh and Data Fabric
      1. Data Mesh
      2. Data Fabric
    5. Summary
  2. 2. Getting started with Amazon Redshift
    1. Amazon Redshift Architecture Overview
    2. Get started with Amazon Redshift serverless
      1. Creating a Amazon Redshift serverless data warehouse
    3. Sample data
      1. Activate sample data models and Query using the query editor
    4. When to use a provisioned cluster?
      1. Creating an Amazon Redshift provisioned cluster
    5. Estimate your Amazon Redshift cost
      1. Amazon Redshift managed storage (RMS)
      2. Amazon Redshift serverless compute cost
      3. Amazon Redshift provisioned compute cost
    6. AWS Account management
    7. Connecting to your Amazon Redshift data warehouse
      1. Private / Public VPC and secure access
      2. Stored password
      3. Temporary Credentials
      4. Federated User
      5. SAML-Based Authentication from an Identity Provider (IdP)
      6. Native IdP Integration
      7. Amazon Redshift Data API
      8. Querying a database using the Query Editor V2
      9. Business Intelligence (BI) using Amazon QuickSight
      10. Connecting to Amazon Redshift using JDBC/ODBC
    8. Summary
  3. 3. Setting up your data models and ingesting data
    1. Data lake first vs Data warehouse first strategy
      1. Data Lake First Strategy
      2. Data warehouse first strategy
      3. Deciding on a strategy
    2. Defining your data model
      1. Database Schemas, Users and Groups
      2. Star Schema, De-normalized, Normalized
    3. Student Information System Learning Dataset
    4. Load batch data into Amazon Redshift
      1. Using a COPY command
      2. Continuous file ingestion from Amazon S3
      3. Using AWS Glue for transformations
      4. Manual loading using SQL Commands
      5. Using the Query Editor V2
      6. Building a star schema
    5. Load real-time and near real-time data
      1. Near real-time replication using AWS Database Migration Service
      2. Amazon Aurora Zero-ETL integration with Amazon Redshift
      3. Using Amazon Appflow
      4. Streaming Ingestion
    6. Optimize your data structures
      1. Automatic table optimization and autonomics
      2. Distribution Style
      3. Sort key
      4. Compression encoding
    7. Summary
  4. 4. Data Transformation Strategies
    1. Comparing ELT and ETL strategies
    2. In-Database Transformation (ELT)
      1. Semi-structure Data
      2. User Defined Functions
      3. Stored Procedures
    3. Scheduling and Orchestration
    4. Access all your data
      1. External Amazon S3 Data
      2. External Operational Data
      3. External Amazon Redshift Data
    5. External Transformation (ETL)
      1. AWS Glue
  5. 5. Scaling and performance optimizations
    1. Scaling for predictable and unpredictable workload changes
      1. Evolving storage demand
      2. Evolving compute demand
      3. Predictable workload changes
      4. Unpredictable workload changes
    2. WLM, Queues & QMR
      1. Queue Assignment
      2. Short Query Acceleration (SQA)
      3. Query Monitoring Rules (QMR)
      4. Automatic WLM (AutoWLM)
      5. Manual WLM
      6. Parameter group
      7. WLM dynamic memory allocation
    3. Materialized Views
    4. Autonomics
      1. Auto Table Optimizer (ATO) and Smart Defaults
      2. Auto Vacuum
      3. Auto Analyze
      4. Auto Materialized Views (AutoMV)
      5. Amazon Redshift Advisor
    5. Workload isolation
    6. Optimizing for best price & performance
      1. Database Vs Data warehouse
      2. Amazon Redshift serverless
      3. Multi-warehouse environment
      4. AWS Data Exchange (ADX)
      5. Table Design
      6. Indexes Vs. Zone-maps
      7. Drivers
      8. Simplify ETL
      9. Query Editor v2
    7. Writing queries and performance tuning
      1. Query processing
      2. Analyzing queries
      3. Identifying queries for performance tuning
    8. Summary
  6. About the Authors

