Chapter 16. Choosing a Shard Key

The most important task when using sharding is choosing how your data will be distributed. To make intelligent choices about this, you have to understand how MongoDB distributes data. This chapter helps you make a good choice of shard key by covering:

  • How to decide among multiple possible shard keys

  • Shard keys for several use cases

  • What you can’t use as a shard key

  • Some alternative strategies if you want to customize how data is distributed

  • How to manually shard your data

It assumes that you understand the basic components of sharding as covered in the previous two chapters.

Taking Stock of Your Usage

When you shard a collection you choose a field or two to use to split up the data. This key (or keys) is called a shard key. Once you shard a collection you cannot change your shard key, so it is important to choose correctly.

To choose a good shard key, you need to understand your workload and how your shard key is going to distribute your application’s requests. This can be difficult to picture, so try to work out some examples—or, even better, try it out on a backup dataset with sample traffic. This section has lots of diagrams and explanations, but there is no substitute for trying it on your own data.

For each collection that you’re planning to shard, start by answering the following questions:

  • How many shards are you planning to grow to? A three-shard cluster has a great deal more flexibility than a thousand-shard cluster. As a cluster gets larger, you should ...

Get MongoDB: The Definitive Guide, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.