Chapter 2. Getting Up and Running with Impala

Depending on your level of expertise with Apache Hadoop, and how much Hadoop infrastructure you already have, you can follow different paths to try out Impala.

Note

Some examples in this book use syntax, functions, and other features that were introduced in Impala 1.4, which is available both on Cloudera’s CDH 5.1 and CDH 4 Hadoop distributions.

Installation

Cloudera Live Demo

The easiest way, with no installation required, is to use the Cloudera Live demo (with optional sign-up). Using the Impala Query Editor through the Hue web interface, you can explore a few sample tables from the TPC-DS benchmark suite, enter SQL code to run queries, and even create your own tables and load data into them.

Cloudera QuickStart VM

If you are from a database background and a Hadoop novice, the Cloudera QuickStart VM lets you try out the basic Impala features straight out of the box. This single-node VM configuration is suitable to become familiar with the main Impala features. (For performance or scalability testing, you would graduate from this single-user, single-machine mode, and typically install the full CDH distribution using Cloudera Manager on a cluster of real machines or high-capacity VMs.) You run the QuickStart VM in VMWare, KVM, or VirtualBox, start the Impala service through the Cloudera Manager web interface, and then interact with Impala through the impala-shell interpreter or the ODBC and JDBC interfaces.

Cloudera Manager ...

Get Getting Started with Impala now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.