O'Reilly logo

Analytics for the Internet of Things (IoT) by Andrew Minteer

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Parquet

Apache Parquet is a columnar storage format for data where the structure of the data is incorporated into the file. It is available to any project in the Hadoop ecosystem and is a key format for analytics. It was designed to meet the goals of interoperability, space efficiency, and query efficiency. Parquet files can be stored in HDFS as well as non-HDFS filesystems.

Parquet logo

Columnar storage works well for analytics as the data is stored and arranged by table columns instead of rows. Analytics use cases typically select multiple columns and perform aggregation functions on the values, such as sum, average, or standard deviation. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required