Chapter 10. Clusters and Job Queues
A cluster is commonly recognized to be a collection of computers working together to solve a common task. It could be viewed from the outside as a larger single system.
In the 1990s, the notion of using a cluster of commodity PCs on a local area network for clustered processing—known as a Beowulf cluster—became popular. Google later gave the practice a boost by using clusters of commodity PCs in its own data centers, particularly for running MapReduce tasks. At the other end of the scale, the TOP500 project ranks the most powerful computer systems each year; these typically have a clustered design, and the fastest machines all use Linux.
Amazon Web Services (AWS) is commonly used both for engineering production clusters in the cloud and for building on-demand clusters for short-lived tasks like machine learning. With AWS you can rent tiny to huge machines with 10s of CPUs and up to 768 GB of RAM for $1 to $15 an hour. Multiple GPUs can be rented at extra cost. Look at “Using IPython Parallel to Support Research” and the ElastiCluster package if you’d like to explore AWS or other providers for ad hoc clusters ...