Data distribution

One simple way of partitioning rows over a set of nodes is to use hashing. You can pick a hash function, and use something such as hash(key_x) % n_nodes to get the node that would store the data for key_x. The problem with this scheme is that adding/deleting nodes would mean that the hash(key_x) % n_nodes values would change for pretty much all the keys, and thus cluster scaling would mean moving around a lot of data.

To get around this, Cassandra uses a concept called consistent hashing. We had looked at consistent hashing in Chapter 5, Going Distributed. Here is a quick recap:

Consider a circle with values on it ranging from [0-1], that is, any point on the circle has a value between 0 and 1. Next, we pick a favorite hashing ...

Get Hands-On Software Architecture with Golang now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.