© Deepak Vohra 2016

Deepak Vohra, Practical Hadoop Ecosystem, 10.1007/978-1-4842-2199-0_8

8. Apache Parquet

Deepak Vohra

(1)Apt 105, White Rock, British Columbia, Canada

Apache Parquet is an efficient, structured, column-oriented (also called columnar storage), compressed, binary file format. Parquet supports several compression codecs, including Snappy, GZIP, deflate, and BZIP2. Snappy is the default. Structured file formats such as RCFile, Avro, SequenceFile, and Parquet offer better performance with compression support, which reduces the size of the data on the disk and consequently the I/O and CPU resources required to deserialize data.

In columnar storage , all the column data for a particular column is stored adjacently instead of by row. Columnar ...

Get Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.