One simple way of partitioning rows over a set of nodes is to use hashing. You can pick a hash function, and use something such as hash(key_x) % n_nodes to get the node that would store the data for key_x. The problem with this scheme is that adding/deleting nodes would mean that the hash(key_x) % n_nodes values would change for pretty much all the keys, and thus cluster scaling would mean moving around a lot of data.
To get around this, Cassandra uses a concept called consistent hashing. We had looked at consistent hashing in Chapter 5, Going Distributed. Here is a quick recap:
Consider a circle with values on it ranging from [0-1], that is, any point on the circle has a value between 0 and 1. Next, we pick a favorite hashing ...