O'Reilly logo

Apache Hive Essentials by Dayong Du

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data file optimization

Data file optimization covers the performance improvement on the data files in terms of file format, compression, and storage.

File format

Hive supports TEXTFILE, SEQUENCEFILE, RCFILE, ORC, and PARQUET file formats. The three ways to specify the file format are as follows:

  • CREATE TABLE ... STORE AS <File_Format>
  • ALTER TABLE ... [PARTITION partition_spec] SET FILEFORMAT <File_Format>
  • SET hive.default.fileformat=<File_Format> --default fileformat for table

Here, <File_Type> is TEXTFILE, SEQUENCEFILE, RCFILE, ORC, and PARQUET.

We can load a text file directly to a table with the TEXTFILE format. To load data to the table with other file formats, we need to load the data to a TEXTFILE format table first. Then, use INSERT OVERWRITE ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required