Chapter 5. Analyzing Complex and Nested Data
A Word About Parquet Format
Parquet is a self-describing, compressed columnar format that supports nested data. Many big data systems such as Hadoop, Hive, Spark, and others support reading and writing Parquet files. Drill performs best reading Parquet files, so we recommend that if you are planning on querying large, complex data you convert the data into Parquet format.
Arrays and Maps
In Chapter 4 you learned about all the different data types that exist in Drill, such as
VARCHAR. These data types are common in most databases and programming languages, but unlike most databases, Drill also features two complex data types, array and map, that you’ll need to understand in order to analyze complex datasets.1 Both of these ...