Virtual column functions

Virtual columns are special functions in HQL. Right now, there are two virtual columns: INPUT__FILE__NAME and BLOCK__OFFSET__INSIDE__FILE. The INPUT__FILE__NAME function shows the input file's name for a mapper task.The BLOCK__OFFSET__INSIDE__FILE function shows the current global file position or the current block's file offset if the file is compressed. The following are examples of using virtual columns to find out where data is physically located in HDFS, especially for bucketed and partitioned tables:

> SELECT > INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE as OFFSIDE> FROM employee;+-----------------------------------------------------------------------+| input__file__name                                           | offside |+-----------------------------------------------------------------------+ ...

Get Apache Hive Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.