April 2016
Beginner
268 pages
5h 32m
English
Indexes are useful for increasing the performance of frequent queries based on certain columns. But Hive has limited a capability to index data as indexing large datasets requires sufficient additional storage space and processing overheads. Hive can index the columns to speed up some operations. It stores the indexed data in another table.
Indexes could be created on the tables in Hive. Let us create a sales table in Hive on which we are going to create indexes:
Create table sales(id int, fname string, state string, zip string, ip string, pid string) Row format delimited fields terminated by '\t';
Let us create an index on the state column of this table:
CREATE INDEX index_ip ON TABLE sales(ip) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' ...