Chapter 17. Dynamic Clustering in the Cloud

This chapter continues where Chapter 16 left off and explains how clusters don't have to consist of a fixed number of slaves but rather can be dynamic in nature. Once you've learned all about dynamic clustering, we introduce you to a dynamic set of resources called cloud computing. We then move on to a practical implementation of one cloud computing service: the Amazon Elastic Compute Cloud (EC2). We finish off the chapter by explaining how you can configure your own set of servers on Amazon EC2 for use as a cluster.

Dynamic Clustering

While at most organizations it is still standard practice for most ETL developers to have only one or two servers to work with, it's becoming more common to have a whole set of machines available as a set of general compute resources. This section describes how Kettle clustering can enable you to take advantage of a dynamic pool of computer resources.

Even before terms such as cloud computing and virtual machines became popular, initiatives like SETI@Home were already utilizing computer resources dynamically. SETI@Home was one of the very first popular distributed dynamic clusters; people all over the world contributed processing power to help the Search for Extra Terrestrial Intelligence. The SETI@Home cluster is dynamic in configuration because the number of participating nodes is constantly changing. In fact, SETI@Home is implemented as a screensaver so it's impossible to say up-front how many machines participate ...

Get Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.