Chapter 3. Setting Up a Cluster

Choosing a Shard Key

Choosing a good shard key is absolutely critical. If you choose a bad shard key, it can break your application immediately or when you have heavy traffic, or it can lurk in wait and break your application at a random time.

On the other hand, if you choose a good shard key, MongoDB will just do the right thing as you get more traffic and add more servers, for as long as your application is up.

As you learned in the last chapter, a shard key determines how your data will be distributed across your cluster. Thus, you want a shard key that distributes reads and writes, but that also keeps the data you’re using together. These can seem like contradictory goals, but it can often be accomplished.

First we’ll go over a couple of bad shard key choices and find out why they’re bad, then we’ll come up with a couple of better ones. There is also a good page on the MongoDB wiki on choosing a shard key.

Low-Cardinality Shard Key

Some people don’t really trust or understand how MongoDB automatically distributes data, so they think something along the lines of, “I have four shards, so I will use a field with four possible values for my shard key.” This is a really, really bad idea.

Let’s look at what happens.

Suppose we have an application that stores user information. Each document has a continent field, which is where the user is located. Its value can be “Africa”, “Antarctica”, “Asia”, “Australia”, “Europe”, “North America”, or “South America”. We ...

Get Scaling MongoDB now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.