Introduction
Cloudera Impala is an open source project that is opening up the Apache Hadoop software stack to a wide audience of database analysts, users, and developers. The Impala massively parallel processing (MPP) engine makes SQL queries of Hadoop data simple enough to be accessible to analysts familiar with SQL and to users of business intelligence tools, and it’s fast enough to be used for interactive exploration and experimentation.
The Impala software is written from the ground up for high performance for SQL queries distributed across clusters of connected machines.
This Document
This article is intended for a broad audience of users from a variety of database, data warehousing, or Big Data backgrounds. SQL and Linux experience is a plus. Experience with the Apache Hadoop software stack is useful but not required.
This article points out wherever some aspect of Impala architecture or usage might be new to people who are experienced with databases but not the Apache Hadoop software stack, or vice versa.
The SQL examples in this article are geared toward new users trying out Impala for the first time, showing the simplest way to do things rather than the best practices for performance and scalability.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access