Chapter 14. Network Topologies

Cloud providers let you design almost any network architecture you could imagine to support your instances. You have options for where in the world the instances are running, their IP addresses and DNS names, and all of the rules for how they can talk to each other and the outside world. All of that freedom can be overwhelming.

Cloud providers start you off with a default network that gets you up and running quickly. However, even establishing a single Hadoop cluster leads you to outgrow that initial state, and compels you to confront many questions about how your instances should be arranged, and the rules that they should play by. Your organization may also have its own requirements for where data can live and the protections for it both at rest and in transit, including access rules and redundancy requirements.

The collective layout for a network of computing resources can be called its topology. This chapter defines some common concepts behind cloud network topologies and shows how Hadoop clusters can work within them.

Public and Private Subnets

When it comes to networking and security, perhaps the most fundamental question to ask about a single instance, or an entire cluster’s worth of them for that matter, is: Who can see it?

It’s essential that all of the instances within a single Hadoop cluster be able to see each other. In the typical, basic case, all of the instances run in the same subnet in the same availability zone, so that they ...

Get Moving Hadoop to the Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.