Data file optimization
Data file optimization covers the performance improvement on the data files in terms of file format, compression, and storage.
File format
Hive supports TEXTFILE
, SEQUENCEFILE
, RCFILE
, ORC
, and PARQUET
file formats. The three ways to specify the file format are as follows:
CREATE TABLE ... STORE AS <File_Format>
ALTER TABLE ... [PARTITION partition_spec] SET FILEFORMAT <File_Format>
SET hive.default.fileformat=<File_Format> --default fileformat for table
Here, <File_Type>
is TEXTFILE
, SEQUENCEFILE
, RCFILE
, ORC
, and PARQUET
.
We can load a text file directly to a table with the TEXTFILE
format. To load data to the table with other file formats, we need to load the data to a TEXTFILE
format table first. Then, use INSERT OVERWRITE ...
Get Apache Hive Essentials now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.