Chapter 14. Introduction to Sharding

This chapter covers how to scale with MongoDB. We’ll look at:

  • What sharding is and the components of a cluster

  • How to configure sharding

  • The basics of how sharding interacts with your application

What Is Sharding?

Sharding refers to the process of splitting data up across machines; the term partitioning is also sometimes used to describe this concept. By putting a subset of data on each machine, it becomes possible to store more data and handle more load without requiring larger or more powerful machines—just a larger quantity of less-powerful machines. Sharding may be used for other purposes as well, including placing more frequently accessed data on more performant hardware or splitting a dataset based on geography to locate a subset of documents in a collection (e.g., for users based in a particular locale) close to the application servers from which they are most commonly accessed.

Manual sharding can be done with almost any database software. With this approach, an application maintains connections to several different database servers, each of which are completely independent. The application manages storing different data on different servers and querying against the appropriate server to get data back. This setup can work well but becomes difficult to maintain when adding or removing nodes from the cluster or in the face of changing data distributions or load patterns.

MongoDB supports autosharding, which tries to both abstract the architecture ...

Get MongoDB: The Definitive Guide, 3rd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.