Parquet

Apache Parquet is a columnar storage format for data where the structure of the data is incorporated into the file. It is available to any project in the Hadoop ecosystem and is a key format for analytics. It was designed to meet the goals of interoperability, space efficiency, and query efficiency. Parquet files can be stored in HDFS as well as non-HDFS filesystems.

Parquet logo

Columnar storage works well for analytics as the data is stored and arranged by table columns instead of rows. Analytics use cases typically select multiple columns and perform aggregation functions on the values, such as sum, average, or standard deviation. ...

Get Analytics for the Internet of Things (IoT) now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.