Chapter 18. Semistructured Data

Relational databases are adept at handling complex concepts and relationships, but not all data is easily expressed as rows and columns. Semistructured data formats such as XML, Parquet, Avro, and JSON allow for flexible data storage without the need to conform to a predefined schema. Unlike some database servers, where support for semistructured data was added as an afterthought, Snowflake’s architecture was designed from the very beginning to support both structured and semistructured data. This chapter will explore the creation, storage, and retrieval of JSON documents in Snowflake.

Generating JSON from Relational Data

JSON (JavaScript Object Notation) has evolved into the standard format for data interchange. It is easy for both humans and machines to read and write, and while similar to XML it is less verbose and thus more compact. JSON stores data as key-value pairs, so the data is self-describing and there is no need for an external schema. JSON documents can be generated from relational tables, stored in tables using the variant data type and queried using built-in functions such as flatten().

Let’s start by creating a JSON document from relational tables in Snowflake. There are many built-in functions available for generating XML or JSON documents, but Table 18-1 defines the functions that are used in this chapter.

Table 18-1. Commonly used JSON functions
object_construct() Returns a set of key-value pairs as type object 
array_agg() ...

Get Learning Snowflake SQL and Scripting now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.