O'Reilly logo

Cloudera Impala by John Russell

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Coming to Impala from a Unix or Linux Background

If you are a Unix-oriented tools hacker, Impala fits in nicely at the tail end of your workflow. You create data files with a wide choice of formats for convenience, compactness, or interoperability with different Apache Hadoop components. You tell Impala where those data files are and what fields to expect inside them. That’s it! Then, let the SQL queries commence. You can see the results of queries in a terminal window through the impala-shell command, save them in a file to process with other scripts or applications, or pull them straight into a visualizer or report application through the standard ODBC or JDBC interfaces. It’s transparent to you that behind the scenes, the data is spread across multiple storage devices and processed by multiple servers.

Administration

When you administer Impala, it is a straightforward matter of some daemons communicating with each other through a predefined set of ports. There is an impalad daemon that runs on each data node in the cluster and does most of the work, a statestored daemon that runs on one node and performs periodic health checks on the impalad daemons, and the roadmap includes one more planned service. Log files show the Impala activity occurring on each node.

Administration for Impala is typically folded into administration for the overall cluster through the Cloudera Manager product. You monitor all nodes for out-of-space problems, CPU spikes, network failures, and so on, rather ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required