As the system grows and becomes more distributed, the need for data replication grows rapidly. It works on the core principle of moving a transactional data from one cluster to another. Usually, the master initiates the push to the slave. These transactions are usually done in an asynchronous manner. This is done to minimize the overhead on the master system. Usually, these transactions are done in a batch mode, and the size of the data packets can be controlled by the configuration size.

The benefits of HBase replication are as follows:

  • Data aggregation
  • Online data ingestion combined with offline data analysis
  • Geographic data distribution across multiple data centres
  • Backup and disaster recovery

How to do it…

  1. Let's edit hbase-site.xml:

Get HBase High Performance Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.