Chapter 7. Aggregation

Once you have data stored in MongoDB, you may want to do more than just retrieve it; you may want to analyze and crunch it in interesting ways. This chapter introduces the aggregation tools MongoDB provides:

  • The aggregation framework

  • MapReduce support

  • Several simple aggregation commands: count, distinct, and group

The Aggregation Framework

The aggregation framework lets you transform and combine documents in a collection. Basically, you build a pipeline that processes a stream of documents through several building blocks: filtering, projecting, grouping, sorting, limiting, and skipping.

For example, if you had a collection of magazine articles, you might want find out who your most prolific authors were. Assuming that each article is stored as a document in MongoDB, you could create a pipeline with several steps:

  1. Project the authors out of each article document.

  2. Group the authors by name, counting the number of occurrences.

  3. Sort the authors by the occurrence count, descending.

  4. Limit results to the first five.

Each of these steps maps to an aggregation framework operator:

  1. {"$project" : {"author" : 1}}

    This projects the author field in each document.

    The syntax is similar to the field selector used in querying: you can select fields to project by specifying "fieldname" : 1 or exclude fields with "fieldname" : 0. After this operation, each document in the results looks like: {"_id" : id, "author" : "authorName"}. These resulting documents only exists in memory and are not ...

Get MongoDB: The Definitive Guide, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.