Loading data into HBase
HBase is another component in the Hadoop ecosystem. It is a columnar database, which stores datasets based on the columns, instead of the rows that make it up. This allows for higher compression and faster searching, making columnar databases ideal for the kinds of analytical queries that can cause significant performance issues in traditional relational databases.
Note
For this recipe we will be using the Baseball Dataset loaded into Hadoop in the recipe Loading data into Hadoop, (also in this chapter). It is recommended that the recipe Loading data into Hadoop is performed before continuing.
Getting ready
In this recipe, we will be loading the Schools.csv
, Master.csv
, and SchoolsPlayers.csv
files. The data relates (via the ...
Get Pentaho Data Integration Cookbook Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.