Snowflake: The Definitive Guide

Book description

Snowflake's ability to eliminate data silos and run workloads from a single platform creates opportunities to democratize data analytics, allowing users at all levels within an organization to make data-driven decisions. Whether you're an IT professional working in data warehousing or data science, a business analyst or technical manager, or an aspiring data professional wanting to get more hands-on experience with the Snowflake platform, this book is for you.

You'll learn how Snowflake users can build modern integrated data applications and develop new revenue streams based on data. Using hands-on SQL examples, you'll also discover how the Snowflake Data Cloud helps you accelerate data science by avoiding replatforming or migrating data unnecessarily.

You'll be able to:

  • Efficiently capture, store, and process large amounts of data at an amazing speed
  • Ingest and transform real-time data feeds in both structured and semistructured formats and deliver meaningful data insights within minutes
  • Use Snowflake Time Travel and zero-copy cloning to produce a sensible data recovery strategy that balances system resilience with ongoing storage costs
  • Securely share data and reduce or eliminate data integration costs by accessing ready-to-query datasets available in the Snowflake Marketplace

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Origin of the Book
    2. Who Is This Book For?
    3. Goals of the Book
    4. Navigating this Book
    5. Using Code Examples
    6. Conventions Used in This Book
    7. O’Reilly Online Learning
    8. How to Contact Us
    9. Acknowledgments
  2. 1. Getting Started
    1. Snowflake Web User Interfaces
    2. Prep Work
    3. Snowsight Orientation
    4. Snowsight Preferences
    5. Navigating Snowsight Worksheets
      1. Context Setting
      2. Improved Productivity
    6. Snowflake Community
    7. Snowflake Certifications
    8. Snowday and Snowflake Summit Events
    9. Important Caveats About Code Examples in the Book
    10. Code Cleanup
    11. Summary
    12. Knowledge Check
  3. 2. Creating and Managing the Snowflake Architecture
    1. Prep Work
    2. Traditional Data Platform Architectures
      1. Shared-Disk (Scalable) Architecture
      2. Shared-Nothing (Scalable) Architecture
      3. NoSQL Alternatives
    3. The Snowflake Architecture
    4. The Cloud Services Layer
      1. Managing the Cloud Services Layer
      2. Billing for the Cloud Services Layer
    5. The Query Processing (Virtual Warehouse) Compute Layer
      1. Virtual Warehouse Size
      2. Scaling Up a Virtual Warehouse to Process Large Data Volumes and Complex Queries
      3. Scaling Out with Multicluster Virtual Warehouses to Maximize Concurrency
      4. Creating and Using Virtual Warehouses
      5. Separation of Workloads and Workload Management
      6. Billing for the Virtual Warehouse Layer
    6. Centralized (Hybrid Columnar) Database Storage Layer
      1. Introduction to Zero-Copy Cloning
      2. Introduction to Time Travel
      3. Billing for the Storage Layer
    7. Snowflake Caching
      1. Query Result Cache
      2. Metadata Cache
      3. Virtual Warehouse Local Disk Cache
    8. Code Cleanup
    9. Summary
    10. Knowledge Check
  4. 3. Creating and Managing Snowflake Securable Database Objects
    1. Prep Work
    2. Creating and Managing Snowflake Databases
    3. Creating and Managing Snowflake Schemas
      1. INFORMATION_SCHEMA
      2. ACCOUNT_USAGE Schema
      3. Schema Object Hierarchy
    4. Introduction to Snowflake Tables
    5. Creating and Managing Views
    6. Introduction to Snowflake Stages: File Format Included
    7. Extending SQL with Stored Procedures and UDFs
      1. User-Defined Function (UDF): Task Included
      2. Secure SQL UDTF That Returns Tabular Value (Market Basket Analysis Example)
      3. Stored Procedures
    8. Introduction to Pipes, Streams, and Sequences
    9. Snowflake Streams (Deep Dive)
    10. Snowflake Tasks (Deep Dive)
    11. Code Cleanup
    12. Summary
    13. Knowledge Check
  5. 4. Exploring Snowflake SQL Commands, Data Types, and Functions
    1. Prep Work
    2. Working with SQL Commands in Snowflake
      1. DDL Commands
      2. DCL Commands
      3. DML Commands
      4. TCL Commands
      5. DQL Command
    3. SQL Query Development, Syntax, and Operators in Snowflake
      1. SQL Development and Management
      2. Query Syntax
      3. Query Operators
      4. Long-Running Queries, and Query Performance and Optimization
      5. Snowflake Query Limits
    4. Introduction to Data Types Supported by Snowflake
      1. Numeric Data Types
      2. String and Binary Data Types
      3. Date and Time Input/Output Data Types
      4. Semi-Structured Data Types
      5. Unstructured Data Types
      6. How Snowflake Supports Unstructured Data Use
    5. Snowflake SQL Functions and Session Variables
      1. Using System-Defined (Built-In) Functions
      2. Creating SQL and JavaScript UDFs and Using Session Variables
      3. External Functions
    6. Code Cleanup
    7. Summary
    8. Knowledge Check
  6. 5. Leveraging Snowflake Access Controls
    1. Prep Work
    2. Creating Snowflake Objects
    3. Snowflake System-Defined Roles
    4. Creating Custom Roles
      1. Functional-Level Business and IT Roles
      2. System-Level Service Account and Object Access Roles
    5. Role Hierarchy Assignments: Assigning Roles to Other Roles
    6. Granting Privileges to Roles
    7. Assigning Roles to Users
    8. Testing and Validating Our Work
    9. User Management
    10. Role Management
    11. Snowflake Multi-Account Strategy
    12. Managing Users and Groups with SCIM
    13. Code Cleanup
    14. Summary
    15. Knowledge Check
  7. 6. Data Loading and Unloading
    1. Prep Work
    2. Basics of Data Loading and Unloading
      1. Data Types
      2. File Formats
      3. Data File Compression
      4. Frequency of Data Processing
      5. Snowflake Stage References
      6. Data Sources
    3. Data Loading Tools
      1. Snowflake Worksheet SQL Using INSERT INTO and INSERT ALL Commands
      2. Web UI Load Data Wizard
      3. SnowSQL CLI SQL PUT and COPY INTO Commands
      4. Data Pipelines
      5. Third-Party ETL and ELT Tools
    4. Alternatives to Loading Data
    5. Tools to Unload Data
    6. Data Loading Best Practices for Snowflake Data Engineers
      1. Select the Right Data Loading Tool and Consider the Appropriate Data Type Options
      2. Avoid Row-by-Row Data Processing
      3. Choose the Right Snowflake Virtual Warehouse Size and Split Files as Needed
      4. Transform Data in Steps and Use Transient Tables for Intermediate Results
    7. Code Cleanup
    8. Summary
    9. Knowledge Check
  8. 7. Implementing Data Governance, Account Security, and Data Protection and Recovery
    1. Prep Work
    2. Snowflake Security
      1. Controlling Account Access
      2. Monitoring Activity with the Snowflake ACCESS_HISTORY Account Usage View
      3. Data Protection and Recovery
      4. Replication and Failover
    3. Democratizing Data with Data Governance Controls
      1. INFORMATION_SCHEMA Data Dictionary
      2. Object Tagging
      3. Classification
      4. Data Masking
      5. Row Access Policies and Row-Level Security
      6. External Tokenization
      7. Secure Views and UDFs
      8. Object Dependencies
    4. Code Cleanup
    5. Summary
    6. Knowledge Check
  9. 8. Managing Snowflake Account Costs
    1. Prep Work
    2. Snowflake Monthly Bill
      1. Storage Fees
      2. Data Transfer Costs
      3. Compute Credits Consumed
    3. Creating Resource Monitors to Manage Virtual Warehouse Usage and Reduce Costs
      1. Resource Monitor Credit Quota
      2. Resource Monitor Credit Usage
      3. Resource Monitor Notifications and Other Actions
      4. Resource Monitor Rules for Assignments
      5. DDL Commands for Creating and Managing Resource Monitors
    4. Using Object Tagging for Cost Centers
    5. Querying the ACCOUNT_USAGE View
    6. Using BI Partner Dashboards to Monitor Snowflake Usage and Costs
    7. Snowflake Agile Software Delivery
      1. Why Do We Need DevOps?
      2. Continuous Data Integration, Continuous Delivery, and Continuous Deployment
      3. What Is Database Change Management?
      4. How Zero-Copy Cloning Can Be Used to Support Dev/Test Environments
    8. Code Cleanup
    9. Summary
    10. Knowledge Check
  10. 9. Analyzing and Improving Snowflake Query Performance
    1. Prep Work
    2. Analyzing Query Performance
      1. QUERY_HISTORY Profiling
      2. HASH() Function
      3. Web UI History
    3. Understanding Snowflake Micro-Partitions and Data Clustering
      1. Partitions Explained
      2. Snowflake Micro-Partitions Explained
    4. Snowflake Data Clustering Explained
      1. Clustering Width and Depth
      2. Choosing a Clustering Key
      3. Creating a Clustering Key
      4. Reclustering
    5. Performance Benefits of Materialized Views
    6. Exploring Other Query Optimization Techniques
      1. Search Optimization Service
      2. Query Optimization Techniques Compared
    7. Summary
    8. Code Cleanup
    9. Knowledge Check
  11. 10. Configuring and Managing Secure Data Sharing
    1. Snowflake Architecture Data Sharing Support
    2. The Power of Snowgrid
    3. Data Sharing Use Cases
    4. Snowflake Support for Unified ID 2.0
    5. Snowflake Secure Data Sharing Approaches
    6. Prep Work
    7. Snowflake’s Direct Secure Data Sharing Approach
      1. Creating Outbound Shares
      2. How Inbound Shares Are Used by Snowflake Data Consumers
    8. How to List and Shop on the Public Snowflake Marketplace
      1. Snowflake Marketplace for Providers
      2. Standard Versus Personalized Data Listings
    9. Harnessing the Power of a Snowflake Private Data Exchange
    10. Snowflake Data Clean Rooms
    11. Important Design, Security, and Performance Considerations
      1. Share Design Considerations
      2. Share Security Considerations
      3. Share Performance Considerations
      4. Difference Between Database Sharing and Database Cloning
      5. Data Shares and Time Travel Considerations
      6. Sharing of Data Shares
    12. Summary
    13. Code Cleanup
    14. Knowledge Check
  12. 11. Visualizing Data in Snowsight
    1. Prep Work
    2. Data Sampling in Snowsight
      1. Fixed-Size Sampling Based on a Specific Number of Rows
      2. Fraction-Based Sampling Based on Probability
      3. Previewing Fields and Data
      4. Sampling Examples
    3. Using Automatic Statistics and Interactive Results
    4. Snowsight Dashboard Visualization
      1. Creating a Dashboard and Tiles
      2. Working with Chart Visualizations
      3. Aggregating and Bucketing Data
      4. Editing and Deleting Tiles
    5. Collaboration
      1. Sharing Your Query Results
      2. Using a Private Link to Collaborate on Dashboards
    6. Summary
    7. Code Cleanup
    8. Knowledge Check
  13. 12. Workloads for the Snowflake Data Cloud
    1. Prep Work
    2. Data Engineering
    3. Data Warehousing
      1. Data Vault 2.0 Modeling
      2. Transforming Data within Snowflake
    4. Data Lake
    5. Data Collaboration
      1. Data Monetization
      2. Regulatory and Compliance Requirements for Data Sharing
    6. Data Analytics
      1. Advanced Analytics for the Finance Industry
      2. Advanced Analytics for the Healthcare Industry
      3. Advanced Analytics for the Manufacturing Industry and Logistics Services
      4. Marketing Analytics for Retail Verticals and the Communications and Media Industry
    7. Data Applications
    8. Data Science
      1. Snowpark
      2. Streamlit
    9. Cybersecurity Using Snowflake as a Security Data Lake
      1. Overcoming the Challenges of a SIEM-Only Architecture
      2. Search Optimization Service Versus Clustering
    10. Unistore
      1. Transactional Workload Versus Analytical Workload
      2. Hybrid Tables
    11. Summary
    12. Code Cleanup
    13. Knowledge Check
  14. A. Answers to the Knowledge Check Questions
    1. Chapter 1
    2. Chapter 2
    3. Chapter 3
    4. Chapter 4
    5. Chapter 5
    6. Chapter 6
    7. Chapter 7
    8. Chapter 8
    9. Chapter 9
    10. Chapter 10
    11. Chapter 11
    12. Chapter 12
  15. B. Snowflake Object Naming Best Practices
    1. General (Character Related)
    2. General (Not Character Related)
    3. Standard Label Abbreviations
  16. C. Setting Up a Snowflake Trial Account
  17. Index
  18. About the Author

Product information

  • Title: Snowflake: The Definitive Guide
  • Author(s): Joyce Kay Avila
  • Release date: August 2022
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098103828