6. Big Data File Formats

Overview

This chapter introduces popular big data file formats and skims through their advantages and disadvantages. The file formats that are covered in the chapter are Avro, ORC, and Parquet. It will walk through the code snippets required to implement their transformation and conversion to the desired file format. It will also educate you on attributes such as compression and the read-write strategy and executing queries to highlight the operational performance.

By the end of the chapter, you will be able to select the optimum file format for any user-specific case. You will strengthen these concepts by applying them to a real-world situation and get first-hand experience of performing the necessary queries. ...

Get The Artificial Intelligence Infrastructure Workshop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.