O'Reilly logo
live online training icon Live Online training

Building a Modern Data Platform with Snowflake

A guide to getting off the ground

Topic: Data
Jacob Thomas

Data is so critical to modern business it is often referred to as ‘The New Oil​’. However, as those involved with the oil industry know, it is only after the pipelining, refining, and delivery mechanisms are built that oil has actual value. Data is no different and can often become a company ​liability​ versus an ​asset. This course will teach you how to get a developer-friendly, highly scalable data platform off the ground and help you turn your data back into an asset.

Snowflake is a modern data warehouse that is built for cloud-scale workloads. For the enterprise, it’s robust, highly scalable, and secure. For the startup, it’s extremely cost-efficient, easily managed, and will scale effortlessly as your needs dictate. Like any system, there are many aspects to consider while setting it up for long-term success. This course will teach you to do just that.

What you'll learn-and how you can apply it

  • Why choose Snowflake?
  • How to provision a snowflake cluster.
  • How to isolate computer resources and establish a clean separation of concerns.
  • How to create databases, schemas, and tables.
  • How to stage, load, and unload data.
  • How to keep your data secure with network policies, role-based access control, data encryption, and MFA.
  • How to explore and query data via Snowflake’s query UI.
  • You’ll learn industry best-practices for structuring data warehouses.

This training course is for you because...

  • You’ve been tasked to build an analytics platform and want to set it up for success.
  • You lead an analytics team and are evaluating Snowflake as your data warehousing solution.
  • You’re a software engineer, data engineer, or data scientist looking to leverage a massively parallel cloud data warehouse.
  • Your existing cloud data warehousing solution isn’t scaling as promised and you’re evaluating Snowflake as an alternative.

Prerequisites

  • A working knowledge of SQL.
  • A working knowledge of analytics architecture.
  • A need for a performant, highly scalable, cost-effective analytics solution.

Course Set-up

If you don’t have a Snowflake account already, set up a trial account. This course will be completely free since Snowflake offers $400 of free credits: https://trial.snowflake.com/

Recommended Preparation

Recommended Follow-up

About your instructor

  • Jacob is the lead data engineer at Cargurus, an industry-leading automotive marketplace. He currently leads high-volume data pipelining and warehousing efforts and has also played a lead role in building out the current data platform with Snowflake. Prior to Cargurus, Jacob built out the analytics and data pipelining stack at Wanderu and did similar work at Safari Books Online/O’Reilly before that. He’s helped numerous startups and businesses modernize their data operations along the way.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Schedule Segment 1: Why Snowflake? (30 minutes)

  • Developer productivity and happiness
  • Security
  • Scalability
  • Community
  • Efficiency

Exercise/Activity I - Audience Discussion, Setting the Stage (15 minutes)

  • What would you do if your engineers could spend their time building, instead of doing database administration?
  • What do you expect in an analytics platform?
  • (If applicable) What are some ways your analytics stack has struggled to scale?

Break (15 minutes)

Snowflake Database Fundamentals (45 minutes)

  • Users and roles
  • Warehouses
  • Databases, schemas, tables, oh my!
  • Internal and external stages
  • Loading data (copy from, copy into, etc)
  • Unloading data

Exercise/Activity II - Hands-On Snowflake (15 minutes)

Break (15 minutes)

Snowflake Security (15 minutes)

  • Discuss the validations (Soc I and Soc II, HIPAA, PCI DSS)
  • Access control (RBAC, DAC)
  • Network security (institute network policy but discuss private links)
  • Data at rest (discuss encryption and rekeying)
  • MFA (and SCIM)

Snowflake UI (15 minutes)

  • Walk through new Snowflake UI
  • Administration
  • Data discovery
  • Sharable filters
  • Sharable worksheets
  • Dashboards

Industry Best-Practices/ ‘Where to go from here’ (15 minutes)

  • ELT vs ETL
  • ‘Raw’ vs ‘modeled’ separation
  • Production-ready system architecture
  • Roles, policies, access control for sensitive data
  • Data modeling and visualization