book

MongoDB: The Definitive Guide, 2nd Edition

Name: MongoDB: The Definitive Guide, 2nd Edition
Author: Kristina Chodorow
ISBN: 9781449344689

by Kristina Chodorow

May 2013

Intermediate to advanced

430 pages

11h 2m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
How This Book Is OrganizedGetting Started with MongoDBDeveloping with MongoDBReplicationShardingApplication AdministrationServer AdministrationAppendixesConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgments
I. Introduction to MongoDB
1. Introduction
Ease of UseEasy ScalingTons of Features……Without Sacrificing SpeedLet’s Get Started
2. Getting Started
DocumentsCollectionsDynamic SchemasNamingSubcollectionsDatabasesGetting and Starting MongoDBIntroduction to the MongoDB ShellRunning the ShellA MongoDB ClientBasic Operations with the ShellCreateReadUpdateDeleteData TypesBasic Data TypesDatesArraysEmbedded Documents_id and ObjectIdsObjectIdsAutogeneration of _idUsing the MongoDB ShellTips for Using the ShellRunning Scripts with the ShellCreating a .mongorc.jsCustomizing Your PromptEditing Complex VariablesInconvenient Collection Names
3. Creating, Updating, and Deleting Documents
Inserting and Saving DocumentsBulk InsertInsert ValidationRemoving DocumentsRemove SpeedUpdating DocumentsDocument ReplacementUsing ModifiersGetting started with the “$set” modifierIncrementing and decrementingArray modifiersAdding elementsUsing arrays as setsRemoving elementsPositional array modificationsModifier speedUpsertsThe save shell helperUpdating Multiple DocumentsReturning Updated DocumentsSetting a Write Concern
4. Querying
Introduction to findSpecifying Which Keys to ReturnLimitationsQuery CriteriaQuery ConditionalsOR Queries$notConditional SemanticsType-Specific QueriesnullRegular ExpressionsQuerying Arrays$all$sizeThe $slice operatorReturning a matching array elementArray and range query interactionsQuerying on Embedded Documents$where QueriesServer-Side ScriptingCursorsLimits, Skips, and SortsComparison orderAvoiding Large SkipsPaginating results without skipFinding a random documentAdvanced Query OptionsGetting Consistent ResultsImmortal CursorsDatabase CommandsHow Commands Work
II. Designing Your Application
5. Indexing
Introduction to IndexingIntroduction to Compound IndexesUsing Compound IndexesChoosing key directionsUsing covered indexesImplicit indexesHow $-Operators Use IndexesInefficient operatorsRangesOR queriesIndexing Objects and ArraysIndexing embedded docsIndexing arraysMultikey index implicationsIndex CardinalityUsing explain() and hint()The Query OptimizerWhen Not to IndexTypes of IndexesUnique IndexesCompound unique indexesDropping duplicatesSparse IndexesIndex AdministrationIdentifying IndexesChanging Indexes
6. Special Index and Collection Types
Capped CollectionsCreating Capped CollectionsSorting Au NaturelTailable CursorsNo-_id CollectionsTime-To-Live IndexesFull-Text IndexesSearch SyntaxFull-Text Search OptimizationSearching in Other LanguagesGeospatial IndexingTypes of Geospatial QueriesCompound Geospatial Indexes2D IndexesStoring Files with GridFSGetting Started with GridFS: mongofilesWorking with GridFS from the MongoDB DriversUnder the Hood

7. Aggregation
The Aggregation FrameworkPipeline Operations$match$projectPipeline expressionsMathematical expressionsDate expressionsString expressionsLogical expressionsA projection example$groupGrouping operatorsArithmetic operatorsExtreme operatorsArray operatorsGrouping behavior$unwind$sort$limit$skipUsing PipelinesMapReduceExample 1: Finding All Keys in a CollectionExample 2: Categorizing Web PagesMongoDB and MapReduceThe finalize functionKeeping output collectionsMapReduce on a subset of documentsUsing a scopeGetting more outputAggregation CommandscountdistinctgroupUsing a finalizerUsing a function as a key
8. Application Design
Normalization versus DenormalizationExamples of Data RepresentationsCardinalityFriends, Followers, and Other InconveniencesDealing with the Wil Wheaton effectOptimizations for Data ManipulationOptimizing for Document GrowthRemoving Old DataPlanning Out Databases and CollectionsManaging ConsistencyMigrating SchemasWhen Not to Use MongoDB
III. Replication
9. Setting Up a Replica Set
Introduction to ReplicationA One-Minute Test SetupConfiguring a Replica Setrs Helper FunctionsNetworking ConsiderationsChanging Your Replica Set ConfigurationHow to Design a SetHow Elections WorkMember Configuration OptionsCreating Election ArbitersUse at most one arbiterThe downside to using an arbiterPriorityHiddenSlave DelayBuilding Indexes
10. Components of a Replica Set
SyncingInitial SyncHandling StalenessHeartbeatsMember StatesElectionsRollbacksWhen Rollbacks Fail
11. Connecting to a Replica Set from Your Application
Client-to-Replica-Set Connection BehaviorWaiting for Replication on WritesWhat Can Go Wrong?Other Options for “w”Custom Replication GuaranteesGuaranteeing One Server per Data CenterGuaranteeing a Majority of Nonhidden MembersCreating Other GuaranteesSending Reads to SecondariesConsistency ConsiderationsLoad ConsiderationsReasons to Read from Secondaries
12. Administration
Starting Members in Standalone ModeReplica Set ConfigurationCreating a Replica SetChanging Set MembersCreating Larger SetsForcing ReconfigurationManipulating Member StateTurning Primaries into SecondariesPreventing ElectionsUsing Maintenance ModeMonitoring ReplicationGetting the StatusVisualizing the Replication GraphReplication LoopsDisabling ChainingCalculating LagResizing the OplogRestoring from a Delayed SecondaryBuilding IndexesReplication on a BudgetHow the Primary Tracks LagMaster-SlaveConverting Master-Slave to a Replica SetMimicking Master-Slave Behavior with Replica Sets
IV. Sharding
13. Introduction to Sharding
Introduction to ShardingUnderstanding the Components of a ClusterA One-Minute Test Setup
14. Configuring Sharding
When to ShardStarting the ServersConfig ServersThe mongos ProcessesAdding a Shard from a Replica SetAdding CapacitySharding DataHow MongoDB Tracks Cluster DataChunk RangesSplitting ChunksThe Balancer
15. Choosing a Shard Key
Taking Stock of Your UsagePicturing DistributionsAscending Shard KeysRandomly Distributed Shard KeysLocation-Based Shard KeysShard Key StrategiesHashed Shard KeyHashed Shard Keys for GridFSThe Firehose StrategyMulti-HotspotShard Key Rules and GuidelinesShard Key LimitationsShard Key CardinalityControlling Data DistributionUsing a Cluster for Multiple Databases and CollectionsManual Sharding
16. Sharding Administration
Seeing the Current StateGetting a Summary with sh.statusSeeing Configuration Informationconfig.shardsconfig.databasesconfig.collectionsconfig.chunksconfig.changelogconfig.tagsconfig.settingsTracking Network ConnectionsGetting Connection StatisticsLimiting the Number of ConnectionsServer AdministrationAdding ServersChanging Servers in a ShardChanging a shard from a standalone server to replica setRemoving a ShardChanging Config ServersBalancing DataThe BalancerChanging Chunk SizeMoving ChunksJumbo ChunksDistributing jumbo chunksPreventing jumbo chunksRefreshing Configurations
V. Application Administration
17. Seeing What Your Application Is Doing
Seeing the Current OperationsFinding Problematic OperationsKilling OperationsFalse PositivesPreventing Phantom OperationsUsing the System ProfilerCalculating SizesDocumentsCollectionsDatabasesUsing mongotop and mongostat
18. Data Administration
Setting Up AuthenticationAuthentication BasicsSetting Up AuthenticationHow Authentication WorksCreating and Deleting IndexesCreating an Index on a Standalone ServerCreating an Index on a Replica SetCreating an Index on a Sharded ClusterRemoving IndexesBeware of the OOM KillerPreheating DataMoving Databases into RAMMoving Collections into RAMCustom-PreheatingCompacting DataMoving CollectionsPreallocating Data Files
19. Durability
What Journaling DoesPlanning Commit BatchesSetting Commit IntervalsTurning Off JournalingReplacing Data FilesRepairing Data FilesThe mongod.lock FileSneaky Unclean ShutdownsWhat MongoDB Does Not GuaranteeChecking for CorruptionDurability with Replication
VI. Server Administration
20. Starting and Stopping MongoDB
Starting from the Command LineFile-Based ConfigurationStopping MongoDBSecurityData EncryptionSSL ConnectionsLogging
21. Monitoring MongoDB
Monitoring Memory UsageIntroduction to Computer MemoryTracking Memory UsageTracking Page FaultsMinimizing Btree MissesIO WaitTracking Background Flush AveragesCalculating the Working SetSome Working Set ExamplesTracking PerformanceTracking Free SpaceMonitoring Replication
22. Making Backups
Backing Up a ServerFilesystem SnapshotCopying Data FilesUsing mongodumpMoving collections and databases with mongodump and mongorestoreAdministrative complications with unique indexesBacking Up a Replica SetBacking Up a Sharded ClusterBacking Up and Restoring an Entire ClusterBacking Up and Restoring a Single ShardCreating Incremental Backups with mongooplog
23. Deploying MongoDB
Designing the SystemChoosing a Storage MediumAn example from the wildRecommended RAID ConfigurationsCPUChoosing an Operating SystemSwap SpaceFilesystemVirtualizationTurn Off Memory OvercommittingMystery MemoryHandling Network Disk IO IssuesUsing Non-Networked DisksConfiguring System SettingsTurning Off NUMASetting a Sane ReadaheadDisabling HugepagesChoosing a Disk Scheduling AlgorithmDon’t Track Access TimeModifying LimitsConfiguring Your NetworkSystem HousekeepingSynchronizing ClocksThe OOM KillerTurn Off Periodic Tasks
A. Installing MongoDB
Choosing a VersionWindows InstallInstalling as a ServicePOSIX (Linux, Mac OS X, and Solaris) InstallInstalling from a Package Manager
B. MongoDB Internals
BSONWire ProtocolData FilesNamespaces and ExtentsMemory-Mapped Storage Engine
Index
Colophon
Copyright

Content preview from MongoDB: The Definitive Guide, 2nd Edition

Foreword

Jeremy Zawodny

Craigslist Software Engineer

In the last 10 years, the Internet has challenged relational databases in ways nobody could have foreseen. Having used MySQL at large and growing Internet companies during this time, I’ve seen this happen firsthand. First you have a single server with a small data set. Then you find yourself setting up replication so you can scale out reads and deal with potential failures. And, before too long, you’ve added a caching layer, tuned all the queries, and thrown even more hardware at the problem.

Eventually you arrive at the point when you need to shard the data across multiple clusters and rebuild a ton of application logic to deal with it. And soon after that you realize that you’re locked into the schema you modeled so many months before.

Why? Because there’s so much data in your clusters now that altering the schema will take a long time and involve a lot of precious DBA time. It’s easier just to work around it in code. This can keep a small team of developers busy for many months. In the end, you’ll always find yourself wondering if there’s a better way—or why more of these features are not built into the core database server.

Keeping with tradition, the Open Source community has created a plethora of “better ways” in response to the ballooning data needs of modern web applications. They span the spectrum from simple in-memory key/value stores to complicated SQL-speaking MySQL/InnoDB derivatives. But the sheer number of choices has ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

MongoDB: The Definitive Guide, 3rd Edition

Publisher Resources

ISBN: 9781449344795Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

MongoDB: The Definitive Guide, 2nd Edition

by Kristina Chodorow