book

MongoDB: The Definitive Guide, 2nd Edition

Name: MongoDB: The Definitive Guide, 2nd Edition
Author: Kristina Chodorow
ISBN: 9781449344689

by Kristina Chodorow

May 2013

Intermediate to advanced

430 pages

11h 2m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
How This Book Is OrganizedGetting Started with MongoDBDeveloping with MongoDBReplicationShardingApplication AdministrationServer AdministrationAppendixesConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgments
I. Introduction to MongoDB
1. Introduction
Ease of UseEasy ScalingTons of Features……Without Sacrificing SpeedLet’s Get Started
2. Getting Started
DocumentsCollectionsDynamic SchemasNamingSubcollectionsDatabasesGetting and Starting MongoDBIntroduction to the MongoDB ShellRunning the ShellA MongoDB ClientBasic Operations with the ShellCreateReadUpdateDeleteData TypesBasic Data TypesDatesArraysEmbedded Documents_id and ObjectIdsObjectIdsAutogeneration of _idUsing the MongoDB ShellTips for Using the ShellRunning Scripts with the ShellCreating a .mongorc.jsCustomizing Your PromptEditing Complex VariablesInconvenient Collection Names
3. Creating, Updating, and Deleting Documents
Inserting and Saving DocumentsBulk InsertInsert ValidationRemoving DocumentsRemove SpeedUpdating DocumentsDocument ReplacementUsing ModifiersGetting started with the “$set” modifierIncrementing and decrementingArray modifiersAdding elementsUsing arrays as setsRemoving elementsPositional array modificationsModifier speedUpsertsThe save shell helperUpdating Multiple DocumentsReturning Updated DocumentsSetting a Write Concern
4. Querying
Introduction to findSpecifying Which Keys to ReturnLimitationsQuery CriteriaQuery ConditionalsOR Queries$notConditional SemanticsType-Specific QueriesnullRegular ExpressionsQuerying Arrays$all$sizeThe $slice operatorReturning a matching array elementArray and range query interactionsQuerying on Embedded Documents$where QueriesServer-Side ScriptingCursorsLimits, Skips, and SortsComparison orderAvoiding Large SkipsPaginating results without skipFinding a random documentAdvanced Query OptionsGetting Consistent ResultsImmortal CursorsDatabase CommandsHow Commands Work
II. Designing Your Application
5. Indexing
Introduction to IndexingIntroduction to Compound IndexesUsing Compound IndexesChoosing key directionsUsing covered indexesImplicit indexesHow $-Operators Use IndexesInefficient operatorsRangesOR queriesIndexing Objects and ArraysIndexing embedded docsIndexing arraysMultikey index implicationsIndex CardinalityUsing explain() and hint()The Query OptimizerWhen Not to IndexTypes of IndexesUnique IndexesCompound unique indexesDropping duplicatesSparse IndexesIndex AdministrationIdentifying IndexesChanging Indexes
6. Special Index and Collection Types
Capped CollectionsCreating Capped CollectionsSorting Au NaturelTailable CursorsNo-_id CollectionsTime-To-Live IndexesFull-Text IndexesSearch SyntaxFull-Text Search OptimizationSearching in Other LanguagesGeospatial IndexingTypes of Geospatial QueriesCompound Geospatial Indexes2D IndexesStoring Files with GridFSGetting Started with GridFS: mongofilesWorking with GridFS from the MongoDB DriversUnder the Hood

7. Aggregation
The Aggregation FrameworkPipeline Operations$match$projectPipeline expressionsMathematical expressionsDate expressionsString expressionsLogical expressionsA projection example$groupGrouping operatorsArithmetic operatorsExtreme operatorsArray operatorsGrouping behavior$unwind$sort$limit$skipUsing PipelinesMapReduceExample 1: Finding All Keys in a CollectionExample 2: Categorizing Web PagesMongoDB and MapReduceThe finalize functionKeeping output collectionsMapReduce on a subset of documentsUsing a scopeGetting more outputAggregation CommandscountdistinctgroupUsing a finalizerUsing a function as a key
8. Application Design
Normalization versus DenormalizationExamples of Data RepresentationsCardinalityFriends, Followers, and Other InconveniencesDealing with the Wil Wheaton effectOptimizations for Data ManipulationOptimizing for Document GrowthRemoving Old DataPlanning Out Databases and CollectionsManaging ConsistencyMigrating SchemasWhen Not to Use MongoDB
III. Replication
9. Setting Up a Replica Set
Introduction to ReplicationA One-Minute Test SetupConfiguring a Replica Setrs Helper FunctionsNetworking ConsiderationsChanging Your Replica Set ConfigurationHow to Design a SetHow Elections WorkMember Configuration OptionsCreating Election ArbitersUse at most one arbiterThe downside to using an arbiterPriorityHiddenSlave DelayBuilding Indexes
10. Components of a Replica Set
SyncingInitial SyncHandling StalenessHeartbeatsMember StatesElectionsRollbacksWhen Rollbacks Fail
11. Connecting to a Replica Set from Your Application
Client-to-Replica-Set Connection BehaviorWaiting for Replication on WritesWhat Can Go Wrong?Other Options for “w”Custom Replication GuaranteesGuaranteeing One Server per Data CenterGuaranteeing a Majority of Nonhidden MembersCreating Other GuaranteesSending Reads to SecondariesConsistency ConsiderationsLoad ConsiderationsReasons to Read from Secondaries
12. Administration
Starting Members in Standalone ModeReplica Set ConfigurationCreating a Replica SetChanging Set MembersCreating Larger SetsForcing ReconfigurationManipulating Member StateTurning Primaries into SecondariesPreventing ElectionsUsing Maintenance ModeMonitoring ReplicationGetting the StatusVisualizing the Replication GraphReplication LoopsDisabling ChainingCalculating LagResizing the OplogRestoring from a Delayed SecondaryBuilding IndexesReplication on a BudgetHow the Primary Tracks LagMaster-SlaveConverting Master-Slave to a Replica SetMimicking Master-Slave Behavior with Replica Sets
IV. Sharding
13. Introduction to Sharding
Introduction to ShardingUnderstanding the Components of a ClusterA One-Minute Test Setup
14. Configuring Sharding
When to ShardStarting the ServersConfig ServersThe mongos ProcessesAdding a Shard from a Replica SetAdding CapacitySharding DataHow MongoDB Tracks Cluster DataChunk RangesSplitting ChunksThe Balancer
15. Choosing a Shard Key
Taking Stock of Your UsagePicturing DistributionsAscending Shard KeysRandomly Distributed Shard KeysLocation-Based Shard KeysShard Key StrategiesHashed Shard KeyHashed Shard Keys for GridFSThe Firehose StrategyMulti-HotspotShard Key Rules and GuidelinesShard Key LimitationsShard Key CardinalityControlling Data DistributionUsing a Cluster for Multiple Databases and CollectionsManual Sharding
16. Sharding Administration
Seeing the Current StateGetting a Summary with sh.statusSeeing Configuration Informationconfig.shardsconfig.databasesconfig.collectionsconfig.chunksconfig.changelogconfig.tagsconfig.settingsTracking Network ConnectionsGetting Connection StatisticsLimiting the Number of ConnectionsServer AdministrationAdding ServersChanging Servers in a ShardChanging a shard from a standalone server to replica setRemoving a ShardChanging Config ServersBalancing DataThe BalancerChanging Chunk SizeMoving ChunksJumbo ChunksDistributing jumbo chunksPreventing jumbo chunksRefreshing Configurations
V. Application Administration
17. Seeing What Your Application Is Doing
Seeing the Current OperationsFinding Problematic OperationsKilling OperationsFalse PositivesPreventing Phantom OperationsUsing the System ProfilerCalculating SizesDocumentsCollectionsDatabasesUsing mongotop and mongostat
18. Data Administration
Setting Up AuthenticationAuthentication BasicsSetting Up AuthenticationHow Authentication WorksCreating and Deleting IndexesCreating an Index on a Standalone ServerCreating an Index on a Replica SetCreating an Index on a Sharded ClusterRemoving IndexesBeware of the OOM KillerPreheating DataMoving Databases into RAMMoving Collections into RAMCustom-PreheatingCompacting DataMoving CollectionsPreallocating Data Files
19. Durability
What Journaling DoesPlanning Commit BatchesSetting Commit IntervalsTurning Off JournalingReplacing Data FilesRepairing Data FilesThe mongod.lock FileSneaky Unclean ShutdownsWhat MongoDB Does Not GuaranteeChecking for CorruptionDurability with Replication
VI. Server Administration
20. Starting and Stopping MongoDB
Starting from the Command LineFile-Based ConfigurationStopping MongoDBSecurityData EncryptionSSL ConnectionsLogging
21. Monitoring MongoDB
Monitoring Memory UsageIntroduction to Computer MemoryTracking Memory UsageTracking Page FaultsMinimizing Btree MissesIO WaitTracking Background Flush AveragesCalculating the Working SetSome Working Set ExamplesTracking PerformanceTracking Free SpaceMonitoring Replication
22. Making Backups
Backing Up a ServerFilesystem SnapshotCopying Data FilesUsing mongodumpMoving collections and databases with mongodump and mongorestoreAdministrative complications with unique indexesBacking Up a Replica SetBacking Up a Sharded ClusterBacking Up and Restoring an Entire ClusterBacking Up and Restoring a Single ShardCreating Incremental Backups with mongooplog
23. Deploying MongoDB
Designing the SystemChoosing a Storage MediumAn example from the wildRecommended RAID ConfigurationsCPUChoosing an Operating SystemSwap SpaceFilesystemVirtualizationTurn Off Memory OvercommittingMystery MemoryHandling Network Disk IO IssuesUsing Non-Networked DisksConfiguring System SettingsTurning Off NUMASetting a Sane ReadaheadDisabling HugepagesChoosing a Disk Scheduling AlgorithmDon’t Track Access TimeModifying LimitsConfiguring Your NetworkSystem HousekeepingSynchronizing ClocksThe OOM KillerTurn Off Periodic Tasks
A. Installing MongoDB
Choosing a VersionWindows InstallInstalling as a ServicePOSIX (Linux, Mac OS X, and Solaris) InstallInstalling from a Package Manager
B. MongoDB Internals
BSONWire ProtocolData FilesNamespaces and ExtentsMemory-Mapped Storage Engine
Index
Colophon
Copyright

Content preview from MongoDB: The Definitive Guide, 2nd Edition

Chapter 7. Aggregation

Once you have data stored in MongoDB, you may want to do more than just retrieve it; you may want to analyze and crunch it in interesting ways. This chapter introduces the aggregation tools MongoDB provides:

The aggregation framework
MapReduce support
Several simple aggregation commands: count, distinct, and group

The Aggregation Framework

The aggregation framework lets you transform and combine documents in a collection. Basically, you build a pipeline that processes a stream of documents through several building blocks: filtering, projecting, grouping, sorting, limiting, and skipping.

For example, if you had a collection of magazine articles, you might want find out who your most prolific authors were. Assuming that each article is stored as a document in MongoDB, you could create a pipeline with several steps:

Project the authors out of each article document.
Group the authors by name, counting the number of occurrences.
Sort the authors by the occurrence count, descending.
Limit results to the first five.

Each of these steps maps to an aggregation framework operator:

{"$project" : {"author" : 1}}
This projects the author field in each document.
The syntax is similar to the field selector used in querying: you can select fields to project by specifying "fieldname" : 1 or exclude fields with "fieldname" : 0. After this operation, each document in the results looks like: {"_id" : id, "author" : "authorName"}. These resulting documents only exists in memory and are not ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

MongoDB: The Definitive Guide, 3rd Edition

Publisher Resources

ISBN: 9781449344795Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

MongoDB: The Definitive Guide, 2nd Edition

by Kristina Chodorow

Chapter 7. Aggregation

The Aggregation Framework

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.