book

MongoDB: The Definitive Guide, 3rd Edition

by Shannon Bradshaw, Eoin Brazil, Kristina Chodorow

December 2019

Intermediate to advanced

511 pages

12h 50m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

How This Book Is OrganizedGetting Started with MongoDBDeveloping with MongoDBReplicationShardingApplication AdministrationServer AdministrationAppendixesConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact Us
Ease of UseDesigned to ScaleRich with Features……Without Sacrificing SpeedThe Philosophy
DocumentsCollectionsDynamic SchemasNamingDatabasesGetting and Starting MongoDBIntroduction to the MongoDB ShellRunning the ShellA MongoDB ClientBasic Operations with the ShellData TypesBasic Data TypesDatesArraysEmbedded Documents_id and ObjectIdsUsing the MongoDB ShellTips for Using the ShellRunning Scripts with the ShellCreating a .mongorc.jsCustomizing Your PromptEditing Complex VariablesInconvenient Collection Names
Inserting DocumentsinsertManyInsert ValidationinsertRemoving DocumentsdropUpdating DocumentsDocument ReplacementUsing Update OperatorsUpsertsUpdating Multiple DocumentsReturning Updated Documents
Introduction to findSpecifying Which Keys to ReturnLimitationsQuery CriteriaQuery ConditionalsOR Queries$notType-Specific QueriesnullRegular ExpressionsQuerying ArraysQuerying on Embedded Documents$where QueriesCursorsLimits, Skips, and SortsAvoiding Large SkipsImmortal Cursors
Introduction to IndexesCreating an IndexIntroduction to Compound IndexesHow MongoDB Selects an IndexUsing Compound IndexesHow $ Operators Use IndexesIndexing Objects and ArraysIndex Cardinalityexplain OutputWhen Not to IndexTypes of IndexesUnique IndexesPartial IndexesIndex AdministrationIdentifying IndexesChanging Indexes
Geospatial IndexesTypes of Geospatial QueriesUsing Geospatial IndexesCompound Geospatial Indexes2d IndexesIndexes for Full Text SearchCreating a Text IndexText SearchOptimizing Full-Text SearchSearching in Other LanguagesCapped CollectionsCreating Capped CollectionsTailable CursorsTime-To-Live IndexesStoring Files with GridFSGetting Started with GridFS: mongofilesWorking with GridFS from the MongoDB DriversUnder the Hood
Pipelines, Stages, and TunablesGetting Started with Stages: Familiar OperationsExpressions$project$unwindArray ExpressionsAccumulatorsUsing Accumulators in Project StagesIntroduction to GroupingThe _id Field in Group StagesGroup Versus ProjectWriting Aggregation Pipeline Results to a Collection

Introduction to TransactionsA Definition of ACIDHow to Use TransactionsTuning Transaction Limits for Your ApplicationTiming and Oplog Size Limits
Schema Design ConsiderationsSchema Design PatternsNormalization Versus DenormalizationExamples of Data RepresentationsCardinalityFriends, Followers, and Other InconveniencesOptimizations for Data ManipulationRemoving Old DataPlanning Out Databases and CollectionsManaging ConsistencyMigrating SchemasManaging SchemasWhen Not to Use MongoDB
Introduction to ReplicationSetting Up a Replica Set, Part 1Networking ConsiderationsSecurity ConsiderationsSetting Up a Replica Set, Part 2Observing ReplicationChanging Your Replica Set ConfigurationHow to Design a SetHow Elections WorkMember Configuration OptionsPriorityHidden MembersElection ArbitersBuilding Indexes
SyncingInitial SyncReplicationHandling StalenessHeartbeatsMember StatesElectionsRollbacksWhen Rollbacks Fail
Client−to−Replica Set Connection BehaviorWaiting for Replication on WritesOther Options for “w”Custom Replication GuaranteesGuaranteeing One Server per Data CenterGuaranteeing a Majority of Nonhidden MembersCreating Other GuaranteesSending Reads to SecondariesConsistency ConsiderationsLoad ConsiderationsReasons to Read from Secondaries
Starting Members in Standalone ModeReplica Set ConfigurationCreating a Replica SetChanging Set MembersCreating Larger SetsForcing ReconfigurationManipulating Member StateTurning Primaries into SecondariesPreventing ElectionsMonitoring ReplicationGetting the StatusVisualizing the Replication GraphReplication LoopsDisabling ChainingCalculating LagResizing the OplogBuilding IndexesReplication on a Budget
What Is Sharding?Understanding the Components of a ClusterSharding on a Single-Machine Cluster
When to ShardStarting the ServersConfig ServersThe mongos ProcessesAdding a Shard from a Replica SetAdding CapacitySharding DataHow MongoDB Tracks Cluster DataChunk RangesSplitting ChunksThe BalancerCollationsChange Streams
Taking Stock of Your UsagePicturing DistributionsAscending Shard KeysRandomly Distributed Shard KeysLocation-Based Shard KeysShard Key StrategiesHashed Shard KeyHashed Shard Keys for GridFSThe Firehose StrategyMulti-HotspotShard Key Rules and GuidelinesShard Key LimitationsShard Key CardinalityControlling Data DistributionUsing a Cluster for Multiple Databases and CollectionsManual Sharding
Seeing the Current StateGetting a Summary with sh.status()Seeing Configuration InformationTracking Network ConnectionsGetting Connection StatisticsLimiting the Number of ConnectionsServer AdministrationAdding ServersChanging Servers in a ShardRemoving a ShardBalancing DataThe BalancerChanging Chunk SizeMoving ChunksJumbo ChunksRefreshing Configurations
Seeing the Current OperationsFinding Problematic OperationsKilling OperationsFalse PositivesPreventing Phantom OperationsUsing the System ProfilerCalculating SizesDocumentsCollectionsDatabasesUsing mongotop and mongostat
MongoDB Authentication and AuthorizationAuthentication MechanismsAuthorizationUsing x.509 Certificates to Authenticate Both Members and ClientsA Tutorial on MongoDB Authentication and Transport Layer EncryptionEstablish a CAGenerate and Sign Member CertificatesGenerate and Sign Client CertificatesBring Up the Replica Set Without Authentication and Authorization EnabledCreate the Admin UserRestart the Replica Set with Authentication and Authorization Enabled
Durability at the Member Level Through JournalingDurability at the Cluster Level Using Write ConcernThe w and wtimeout Options for writeConcernThe j (Journaling) Option for writeConcernDurability at a Cluster Level Using Read ConcernDurability of Transactions Using a Write ConcernWhat MongoDB Does Not GuaranteeChecking for Corruption
Starting from the Command LineFile-Based ConfigurationStopping MongoDBSecurityData EncryptionSSL ConnectionsLogging
Monitoring Memory UsageIntroduction to Computer MemoryTracking Memory UsageTracking Page FaultsI/O WaitCalculating the Working SetSome Working Set ExamplesTracking PerformanceTracking Free SpaceMonitoring Replication
Backup MethodsBacking Up a ServerFilesystem SnapshotCopying Data FilesUsing mongodumpSpecific Considerations for Replica SetsSpecific Considerations for Sharded ClustersBacking Up and Restoring an Entire ClusterBacking Up and Restoring a Single Shard
Designing the SystemChoosing a Storage MediumRecommended RAID ConfigurationsCPUOperating SystemSwap SpaceFilesystemVirtualizationMemory OvercommittingMystery MemoryHandling Network Disk I/O IssuesUsing Non-Networked DisksConfiguring System SettingsTurning Off NUMASetting ReadaheadDisabling Transparent Huge Pages (THP)Choosing a Disk Scheduling AlgorithmDisabling Access Time TrackingModifying LimitsConfiguring Your NetworkSystem HousekeepingSynchronizing ClocksThe OOM KillerTurn Off Periodic Tasks
Choosing a VersionWindows InstallInstalling as a ServicePOSIX (Linux and Mac OS X) InstallInstalling from a Package Manager
BSONWire ProtocolData FilesNamespacesWiredTiger Storage Engine

Content preview from MongoDB: The Definitive Guide, 3rd Edition

Chapter 14. Introduction to Sharding

This chapter covers how to scale with MongoDB. We’ll look at:

What sharding is and the components of a cluster
How to configure sharding
The basics of how sharding interacts with your application

What Is Sharding?

Sharding refers to the process of splitting data up across machines; the term partitioning is also sometimes used to describe this concept. By putting a subset of data on each machine, it becomes possible to store more data and handle more load without requiring larger or more powerful machines—just a larger quantity of less-powerful machines. Sharding may be used for other purposes as well, including placing more frequently accessed data on more performant hardware or splitting a dataset based on geography to locate a subset of documents in a collection (e.g., for users based in a particular locale) close to the application servers from which they are most commonly accessed.

Manual sharding can be done with almost any database software. With this approach, an application maintains connections to several different database servers, each of which are completely independent. The application manages storing different data on different servers and querying against the appropriate server to get data back. This setup can work well but becomes difficult to maintain when adding or removing nodes from the cluster or in the face of changing data distributions or load patterns.

MongoDB supports autosharding, which tries to both abstract the architecture ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

MongoDB - The Complete Developer's Guide

Publisher Resources

ISBN: 9781491954454Errata Page

MongoDB: The Definitive Guide, 3rd Edition

by Shannon Bradshaw, Eoin Brazil, Kristina Chodorow

Chapter 14. Introduction to Sharding

What Is Sharding?

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

MongoDB - The Complete Developer's Guide

Getting MEAN with Mongo, Express, Angular, and Node, Second Edition

Getting Started with MongoDB and NoSQL LiveLessons

The Complete Node.js Developer Course (3rd Edition)

Publisher Resources

Chapter 14. Introduction to Sharding

What Is Sharding?

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

MongoDB - The Complete Developer's Guide

Getting MEAN with Mongo, Express, Angular, and Node, Second Edition

Getting Started with MongoDB and NoSQL LiveLessons

The Complete Node.js Developer Course (3rd Edition)

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.