O'Reilly logo

Architecting Modern Data Platforms by Lars George, Paul Wilkinson, Ian Buss, Jan Kunigk

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 13. Backup and Disaster Recovery

This chapter outlines the concerns around building a sound strategy for keeping data within a Hadoop-based system safe and available such that, in case of data loss through a user error (erroneously deleted data) or a disaster (such as loss of the entire cluster), a restore can be initiated and completed. This restore leaves the cluster users with some kind of reliable state so they can proceed with their business tasks.

Note that this is necessary, even with high availability (see Chapter 12) enabled, because restoring data also applies to problems that do not arise from maintaining a responsive service. Quite the contrary. Even with redundant components on every level of the stack, losing metadata or data may cause a disruption that can only be mitigated with the proper backup or disaster recovery strategy in place beforehand.

Context

Before we can look into the particular approaches, we first need to establish a context.

Many Distributed Systems

Hadoop is a complex system, comprising many open source projects that work in conjunction to build a unique data processing platform. It is the foundation for many large-scale installations across numerous industry verticals, storing and processing from hundreds of terabytes to multiple petabytes of data. At any scale, there is the need to keep the data safe, and the customary approach is to invest in some kind of backup technology. The difficulties with this approach are manifold.

First, data ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required