O'Reilly logo

Fast Data Processing with Spark 2 - Third Edition by Krishna Sankar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

HBase

HBase is the NoSQL datastore in the Hadoop ecosystem. Integration with a database is essential for Spark. It can read data from an HBase table or write to one. In fact, Spark supports HBase very well via the HadoopdataSet calls.

Tip

If you want to experiment with HBase, you can install a standalone local version of HBase, as described in http://hbase.apache.org/book.html#quickstart.

Before working through the examples, let's create a table and three records in HBase. For testing, you can install a local standalone version of HBase that works from the local filesystem. So there's no need for Hadoop or HDFS. However, this won't be suitable for production.

I created a test table with three records via the HBase shell, as shown in the following ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required