The following sections cover aspects of Impala that deserve a closer look. Brief examples illustrate interesting features for new users. More complex topics are covered by tutorials or deep dives into the inner workings.
Here is what your first Unix command-line session might look like when you’re using Impala. This example from a Bash shell session creates a couple of text files (which could be named anything), copies those files into the HDFS filesystem, and points an Impala table at the data so that it can be queried through SQL. The exact HDFS paths might differ based on your HDFS configuration and Linux users.
cat >csv.txt 1,red,apple,4 2,orange,orange,4 3,yellow,banana,3 4,green,apple,4^D $
cat >more_csv.txt 5,blue,bubblegum,0.5 6,indigo,blackberry,0.2 7,violet,edible flower,0.01 8,white,scoop of vanilla ice cream,3 9,black,licorice stick,0.2^D $
hadoop fs -mkdir /user/hive/staging$
hadoop fs -put csv.txt /user/hive/staging$
hadoop fs -put more_csv.txt /user/hive/staging
Sometimes the user you are logged in as does not have permission to manipulate HDFS files. In that case, issue the commands with the permissions of the
hdfs user, using the form:
sudo -u hdfs hadoop fs
Now that the data files are in the HDFS filesystem, let’s go into the Impala shell and start working with them. (Some of the prompts and output are abbreviated here for easier reading by first-time users.) This ...