Introduction
Cloudera Impala is an open source project that opens up the Apache Hadoop software stack to a wide audience of database analysts, users, and developers. The Impala massively parallel processing (MPP) engine makes SQL queries of Hadoop data simple enough to be accessible to analysts familiar with SQL and to users of business intelligence tools, and it’s fast enough to be used for interactive exploration and experimentation.
From the ground up, the Impala software is written for high performance of SQL queries distributed across clusters of connected machines.
Who Is This Book For?
This book is intended for a broad audience of users from a variety of database, data warehousing, or Big Data backgrounds.
It assumes that you’re experienced enough with SQL not to need explanations for familiar statements such as CREATE TABLE, SELECT, INSERT, and their major clauses.
Linux experience is a plus.
Experience with the Apache Hadoop software stack is useful but not required.
This book points out instances where some aspect of Impala architecture or usage might be new to people who are experienced with databases but not the Apache Hadoop software stack.
The SQL examples in this book start from a simple base for easy comprehension, then build toward best practices that demonstrate high performance and scalability.
Conventions Used in This Book
The following typographical conventions are used in this book:
- Italic
-
Indicates new terms, URLs, email addresses, filenames, and file ...