Skip to Main Content
Getting Started with Impala
book

Getting Started with Impala

by John Russell
September 2014
Intermediate to advanced content levelIntermediate to advanced
152 pages
4h 3m
English
O'Reilly Media, Inc.
Content preview from Getting Started with Impala

Chapter 4. Common Developer Tasks for Impala

Here are the special Impala aspects of some standard operations familiar to database developers.

Getting Data into an Impala Table

Because Impala’s feature set is oriented toward high-performance queries, much of the data you work with in Impala will originate from some other source, and Impala takes over near the end of the extract-transform-load (ETL) pipeline.

To get data into an Impala table, you can point Impala at data files in an arbitrary HDFS location; move data files from somewhere in HDFS into an Impala-managed directory; or copy data from one Impala table to another. Impala can query the original raw data files, without requiring any conversion or reorganization. Impala can also assist with converting and reorganizing data when those changes are helpful for query performance.

As a developer, you might be setting up all parts of a data pipeline, or you might work with files that already exist. Either way, the last few steps in the pipeline are the most important ones from the Impala perspective. You want the data files to go into a well-understood and predictable location in HDFS, and then Impala can work with them.

Note

See Chapter 5 for some demonstrations of ways to construct and load data for your own testing. You can do basic functional testing with trivial amounts of data. For performance and scalability testing, you’ll need many gigabytes worth.

The following sections are roughly in order from the easiest techniques ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Cloudera Impala

Cloudera Impala

John Russell
Getting Started with Kudu

Getting Started with Kudu

Jean-Marc Spaggiari, Mladen Kovacevic, Brock Noland, Ryan Bosshart

Publisher Resources

ISBN: 9781491905760Errata PageSupplemental Content