Chapter 1. Getting Started

Cloud computing is characterized by on-demand availability of data storage and computing power. A primary benefit of cloud computing is that it doesn’t require users to be directly or actively involved in the management of those computer system resources. Other benefits include access to unlimited storage capacity, automatic software updates, instant scalability, high speed, and cost reductions. As expected, the recent explosion of cloud computing led by AWS Redshift, Google BigQuery, and Microsoft Azure Data Warehouse resulted in the decline of on-premises data centers.

Many of the major data warehouse providers, such as Oracle and IBM, that were created as a traditionally hosted solution later adapted to the cloud environment. Unlike those traditional solutions, Snowflake was built natively for the cloud from the ground up. While Snowflake originated as a disruptive cloud data warehouse, it has evolved over time, and today it is far more than an innovative modern data warehouse.

Along the way, Snowflake earned some impressive recognition. Snowflake won first place at the 2015 Strata + Hadoop World startup competition and was named a “Cool Vendor” in Gartner’s Magic Quadrant 2015 DBMS report. In 2019, Snowflake was listed as number 2 on Forbes magazine’s Cloud 100 list and was ranked number 1 on LinkedIn’s U.S. list of Top Startups. On September 16, 2020, Snowflake became the largest software initial public offering (IPO) in history.

Today the Snowflake Data Cloud Platform breaks down silos and enables many different workloads. In addition to the traditional data engineering and data warehouse workloads, Snowflake supports data lake, data collaboration, data analytics, data applications, data science, cybersecurity, and Unistore workloads. Snowflake’s “Many Data Workloads, One Platform” approach gives organizations a way to quickly derive value from rapidly growing data sets in secure and governed ways that enable companies to meet compliance requirements. Since its inception 10 years ago, Snowflake has continued its rapid pace of innovation across the Data Cloud.

Snowflake’s founders first gathered in 2012 with a vision of building a data warehouse for the cloud from the ground up that would unlock the true potential of limitless insights from enormous amounts of varying types of data. Their goal was to build this solution to be secure and powerful but cost-effective and simple to maintain. Just three years later, in 2015, Snowflake’s cloud-built data warehouse became commercially available. Immediately, Snowflake disrupted the data warehousing market with its unique architecture and cloud-agnostic approach. The disruptive Snowflake platform also made data engineering more business oriented, less technical, and less time-consuming, which created more opportunities to democratize data analytics by allowing users at all levels within an organization to make data-driven decisions.

To gain an appreciation for Snowflake’s unique qualities and approach, it’s important to understand the underlying Snowflake architecture. Beginning in Chapter 2 and carrying on throughout the book, you’ll discover the many ways that Snowflake offers near-zero management capability to eliminate much of the administrative and management overhead associated with traditional data warehouses. You’ll get a firsthand look at how Snowflake works, because each chapter includes SQL examples you can try out yourself. There are also knowledge checks at the end of each chapter.

Some of the most innovative Snowflake workloads rely on Snowflake’s Secure Data Sharing capabilities, which were introduced in 2018. Snowflake Secure Data Sharing enables virtually instantly secure governed data to be shared across your business ecosystem. It also opens up many possibilities for monetizing data assets. Chapter 10 is devoted entirely to this innovative Snowflake workload.

In 2021, Snowflake introduced the ability to manage multiple accounts with ease using Snowflake Organizations. Managing multiple accounts makes it possible to separately maintain different environments, such as development and production environments, and to adopt a multicloud strategy. It also means you can better manage costs since you can select which features you need for each separate account. We’ll explore Snowflake Organization management in Chapter 5.

Snowflake also expanded the scope of what’s possible in the Data Cloud with the introduction of Snowpark in June 2021. Across the industry, Snowpark is recognized as a game changer in the data engineering and machine learning spaces. Snowpark is a developer framework that brings new data programmability to the cloud and makes it possible for developers, data scientists, and data engineers to use Java, Scala, or Python to deploy code in a serverless manner.

The Security Data Lake, introduced by Snowflake in 2022, is an innovative workload that empowers cybersecurity and compliance teams to gain full visibility into security logs, at a massive scale, while reducing the costs of security information and event management (SIEM) systems. Interestingly, this cybersecurity workload can be enhanced with cybersecurity partners on the Snowflake Data Exchange who can deliver threat detection, threat hunting, anomaly detection, and more. We’ll take a deep dive into Snowpark and the Security Data Lake workloads in Chapter 12. We’ll also discuss the newest Snowflake workload, Unistore (a workload for transactional and analytical data), in Chapter 12.

This first chapter will introduce you to Snowflake and get you comfortable navigating in Snowsight, the new Snowflake web UI. In addition, the chapter includes information about the Snowflake community, certifications, and Snowflake events. There is also a section which describes caveats about code examples in the book. Taking time to get oriented in this chapter will set you up for success as you navigate the successive chapters.

Snowflake Web User Interfaces

Two different Snowflake web user interfaces are available: the Classic Console and Snowsight, the new Snowflake web UI. Snowsight was first introduced in 2021 and is now the default user interface in newly created Snowflake accounts. Snowsight is expected to become the default user interface for all accounts in early 2023, with the Classic Console being deprecated shortly thereafter. Unless otherwise stated, all our hands-on examples will be completed in Snowsight.

As with all other Snowflake features and functionality, Snowsight is continually being improved. As such, there may be times when the screenshots in the chapters deviate slightly from what is shown in the Snowsight web UI in which you are working.

Note

To support rapid innovation, Snowflake deploys two scheduled releases each week and one behavior change release each month. You can find more information about Snowflake releases in the Snowflake documentation.

Prep Work

In the prep section of each chapter in the book, we’ll create any folders, worksheets, and Snowflake objects that will be needed for that chapter’s hands-on examples.

You’ll need access to a Snowflake instance in order to follow along and complete the hands-on examples while going through the chapters. You can set up a free trial Snowflake account if you do not already have access to a Snowflake instance. If you need information on how to create a free trial Snowflake account, refer to Appendix C.

If you have access to a Snowflake org that defaults to the Classic Console, you can access Snowsight in one of two ways. In the Classic Console web interface, you can click the Snowsight button in the upper-right corner of the screen (as shown in Figure 1-1). Alternatively, you can log in to Snowsight directly.

Figure 1-1. Classic Console web interface showing the Snowsight button

Once you are inside Snowsight, Worksheets is the default tab (as shown in Figure 1-2). You can also click some of the different tabs, including the Data tab and the Compute tab, to see some of the available menu options. As we will see later, the Databases subtab will display the databases available to you within your access rights.

Figure 1-2. Snowsight UI tabs with the Worksheets tab as the default

If you have been working in the Classic Console web interface before now or if this is the first time you’re logging in, you’ll be presented the option to import your worksheets when you first enter Snowsight (as shown in Figure 1-3).

Figure 1-3. An option to import worksheets is presented to you the first time you use Snowsight

If you import a worksheet from the Classic Console UI, a new timestamped folder will be created (as shown in Figure 1-4).

Figure 1-4. The folder name defaults to the day and time when you import your worksheets

You can access Snowsight using one of the latest versions of Google Chrome, Mozilla Firefox, or Apple Safari for macOS. After you log in, a client session is maintained indefinitely with continued user activity. After four hours of inactivity, the current session is terminated and you must log in again. The default session timeout policy of four hours can be changed; the minimum configurable idle timeout value for a session policy is five minutes.

Snowsight Orientation

When logging in to Snowsight, you’ll have access to your User menu and your Main menu. The User menu is where you can see your username and your current role. By default, you’ll be assigned the SYSADMIN role when you first sign up for a trial account.

You’ll also notice that the Main menu defaults to the Worksheets selection. The area to the right of the Main menu is where you have more options for your Main menu selection. In Figure 1-5, you can see the Worksheets menu to the right of the Main menu. Underneath the Worksheets menu are submenus specific to the Main menu selection.

Figure 1-5. Snowsight menu options

Next, we will set our Snowsight preferences and spend a few minutes navigating the Snowsight Worksheets area.

Snowsight Preferences

To get your preferences set up, click directly on your name, or use the drop-down arrow beside your name to access the submenus (as shown in Figure 1-6). You’ll notice that it is possible to switch roles here. Try switching your role to a different role and then switch it back to SYSADMIN. This will be important throughout the chapter exercises as we’ll periodically need to switch between roles.

You’ll also notice that Snowflake Support is available from within the User menu. Whenever you need to submit a case, just click Support. It’s important to note that you won’t be able to create a support case when using a free trial account.

Figure 1-6. User menu selections

Click the Profile option and the Profile submenu will be available to you (as shown in Figure 1-7). Among other things, you’ll be able to change your preferred language and enroll in multifactor authentication (MFA) here. This is also where you would go if you wanted to change your password.

If you make changes to your profile, click the Save button. Otherwise, click the Close button.

Figure 1-7. Profile submenu

Snowflake Community

This book will teach you about the Snowflake architecture and how to use Snowflake. The information and lessons in the book are comprehensive; nevertheless, you may find that you have questions about a particular topic. For this reason, I encourage you to join and participate in the Snowflake community.

If you would like to connect with others who have a similar passion for Snowflake, there are user groups you may be interested in joining. Some user groups hold in-person events in North America, Europe, the Middle East, and the Asia Pacific region. You can find more information about Snowflake user groups on their website. There are also virtual special interest groups (see Figure 1-27 for the list of them) in addition to regional user groups. I help manage the Data Collaboration special interest group.

Figure 1-27. Snowflake special interest groups (virtual)

Snowflake user groups are just one of the ways you can get involved in the Snowflake community. There are also community groups, a link to resources, and access to discussions available from the Snowflake community login page. To access these groups and resources, click Log In at the upper-right corner of the page and then click the “Not a member?” link to create your free Snowflake community member account.

Snowflake has a special Data Superhero program for Snowflake experts who are highly active in the community. Each person recognized as a Data Superhero receives a custom Data Superhero character created by Francis Mao, Snowflake’s director of corporate events. Figure 1-28 shows a gathering of some of the Snowflake Data Superheroes, including me!

Figure 1-28. Snowflake Data Superheroes

Snowflake Certifications

As you progress through the chapters in the book, you’ll find your Snowflake knowledge will grow quickly. At some point, you may want to consider earning a Snowflake certification to demonstrate that knowledge to the community.

All Snowflake certification paths begin with the SnowPro Core certification exam. Passing the SnowPro exam enables you to sit for any or all of the five role-based advanced certifications: Administrator, Architect, Data Analyst, Data Engineer, and Data Scientist. A Snowflake certificate is valid for two years, after which time a recertification exam must be passed. Passing any advanced Snowflake certification resets the clock on the SnowPro Core certification, so you have two more years before you’ll need to recertify. More information about Snowflake certifications can be found on their website.

Snowday and Snowflake Summit Events

Snowflake hosts many events throughout the year, including its Data for Breakfast events. The two Snowflake main events for new-product release information are Snowday and Snowflake Summit. Snowday is a virtual event that happens each November. Snowflake Summit is an in-person event held each June. Snowflake Summit features keynotes, breakout sessions, hands-on labs, an exposition floor, and more.

Important Caveats About Code Examples in the Book

The code examples created for each chapter in the book were designed so that they can be completed independent of other chapters. This enables you to revisit the examples of any particular chapter as well as perform code cleanup at the end of each chapter.

Occasionally, I will point out how things look or function in a free trial account versus how they look or function in a paid account. We’ll see very few differences, but it’s important to realize that there are some ways in which the two differ.

The functionality demonstrated in the chapters assumes you have an Enterprise Edition Snowflake instance or higher. If you are currently working in a Snowflake Standard Edition org, you’ll want to set up a free trial account and select Enterprise Edition. Otherwise, there will be some examples you won’t be able to complete if you use a Standard Edition Snowflake org. More information about different Snowflake editions can be found in Appendix C.

Earlier in the chapter, you saw how to use Snowsight to format your SQL statements. The statements provided for you in this book will follow that formatting when possible, but the format may vary slightly at times due to space limitations. Also, in some chapters we’ll be using the Snowflake sample database that is provided with your Snowflake account. Because the underlying data in that Snowflake sample data set sometimes changes, the result sets included in the chapter examples may be slightly different from your results. Alternative exercises will be made available at https://github.com/SnowflakeDefinitiveGuide for tables or schemas in the Snowflake sample database that later become unavailable.

In this chapter, we executed USE commands to set the context for our database. Throughout the upcoming chapter examples, we’ll include the USE commands to set context. You have the option of executing the USE commands or selecting the context from the drop-down menus.

For coding examples throughout the book, we’ll be following the naming standards best practices in Appendix B whenever we create our own Snowflake objects. Following best practices helps make code simpler, more readable, and easier to maintain. It also enables the creation of reusable code and makes it easier to detect errors. In this section, I’ll call out a few of the most important best practices.

For object names, we’ll be using all uppercase letters. We could achieve the same results by using all lowercase letters or mixed case, because Snowflake converts object names to uppercase. If you want to have an object name with mixed-case or all lowercase letters, for example, you’ll need to enclose your object name in quotes. It is also important to keep this functionality in mind when you work with third-party tools that connect to Snowflake. Some third-party tools can’t accept whitespace or special characters, so it’s best to avoid both of those when naming Snowflake objects.

Another important best practice is to avoid using the same name being used by a different object. In later chapters you’ll learn about the different table types, including temporary tables and permanent tables. You’ll discover that temporary tables are session based; thus, they are not bound by uniqueness requirements. Even though you can create a temporary table with the same name as another table, it is not a good practice to do so. This will be discussed in more detail in Chapter 3.

Snowflake controls access through roles, and specific roles are responsible for creating certain types of objects. For example, the SYSADMIN role is used to create databases and virtual warehouses. We’ll learn more about roles in Chapter 5, and whenever possible, we’ll follow best practices in our examples. Sometimes we’ll need to diverge from best practices to keep the example simple. When that happens, I’ll mention that we’re using a different role than the recommended role.

When creating a new object, you can use the CREATE statement alone or, optionally, you can add either the IF NOT EXISTS or the OR REPLACE syntax to the CREATE statement. For our examples, we’ll use the CREATE OR REPLACE syntax so that you can always go back to the beginning of the chapter exercises to start over without having to drop objects first. In practice, though, be sure to use the CREATE IF NOT EXISTS syntax, especially in a production environment. If you do mistakenly use the OR REPLACE syntax in production, you have the option to use Snowflake Time Travel capabilities to return the object to its original state. Time Travel is demonstrated in Chapter 7.

The code examples in the book have been thoroughly tested; nevertheless, it is always possible to have errors. If you find an error in the code or in the text, please submit your errata to O’Reilly at https://oreil.ly/snowflake-the-definitive-guide.

Code Cleanup

We didn’t create any new Snowflake objects in this chapter, so no cleanup is needed. It’s OK to leave the folder and worksheets we created as they don’t take up any storage or result in any compute charges.

Summary

There is a reason why Snowflake became the largest software IPO in history, and why Snowflake continues to innovate and enable industry disruption. The Snowflake Data Cloud is secure, scalable, and simple to manage; its ability to eliminate data silos and run workloads from a single platform creates opportunities to democratize data analytics. As you make your way through each chapter in the book, you’ll see firsthand why Snowflake is a leader in the new data economy.

In this introductory chapter, you’ve been able to familiarize yourself with Snowsight, the Snowflake web user interface. You’ve also had an opportunity to learn about Snowflake community events as well as Snowflake certifications. Importantly, you learned about caveats for the code examples in this book. Most notably, with only a few exceptions, we’ll be following the Snowflake best practices outlined in Appendix B.

The next chapter will compare traditional data platform architectures to Snowflake. We’ll then take a deep dive into the three distinct Snowflake layers. It’s an important foundational chapter to help you better understand some of the more technical aspects of Snowflake. You’ll also get hands-on experience creating new clusters of compute resources called virtual warehouses.

Knowledge Check

The following questions are based on the information provided in this chapter:

  1. The Snowflake free trial account allows you to use almost all the functionality of a paid account. What are a few of the differences, though, when using a Snowflake trial account?

  2. What are your options if you are unable to create a support case within Snowflake?

  3. Explain what it means to set the context for your Snowflake worksheet.

  4. What does “Format query” do? Specifically, does “Format query” correct your spelling for commands or table names?

  5. For how long are Snowflake certifications valid? What are two ways you can extend the date for Snowflake certifications?

  6. What are two different ways you can execute a SQL query in a Snowflake worksheet?

  7. From within a Snowflake worksheet, how do you return to the Main menu?

  8. What is the reason we’ll be using all uppercase letters when naming our Snowflake objects? Could we instead use mixed case? If so, explain how we would accomplish that.

  9. We’ll be using the CREATE OR REPLACE statement in future chapters to create new Snowflake objects. Why is it better not to do this in a production environment?

  10. When you create a new Snowflake worksheet, what is the default name of the worksheet?

Answers to these questions are available in Appendix A.

Get Snowflake: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.