Scaling MongoDB

Chapter 2. Understanding Sharding

To set up, administrate, or debug a cluster, you have to understand the basic scheme of how sharding works. This chapter covers the basics so that you can reason about what’s going on.

Splitting Up Data

A shard is one or more servers in a cluster that are responsible for some subset of the data. For instance, if we had a cluster that contained 1,000,000 documents representing a website’s users, one shard might contain information about 200,000 of the users.

A shard can consist of many servers. If there is more than one server in a shard, each server has an identical copy of the subset of data (Figure 2-1). In production, a shard will usually be a replica set.

Figure 2-1. A shard contains some subset of the data. If a shard contains more than one server, each server has a complete copy of the data.

To evenly distribute data across shards, MongoDB moves subsets of the data from shard to shard. It figures out which subsets to move based on a key that you choose. For example, we might choose to split up a collection of users based on the username field. MongoDB uses range-based splitting; that is, data is split into chunks of given ranges—e.g., ["a”, “f”).

Throughout this text, I’ll use standard range notation to describe ranges. “[” and “]” denote inclusive bounds and “(” and “)” denote exclusive bounds. Thus, the four possible ranges are:

x is in (a, b): If ...

Get Scaling MongoDB now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Scaling MongoDB by Kristina Chodorow

Chapter 2. Understanding Sharding

Splitting Up Data

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly