MongoDB Aggregation Framework Stages and Pipelining

MongoDB supports rich queries through it’s powerful aggregation framework, and allows developers to manipulate data in a similar way to SQL. Effectively, it allows developers to perform advanced data analysis on MongoDB data. This whitepaper provides a foundation of essential aggregation concepts - how multiple documents can be efficiently queried, grouped, sorted and results presented in appropriate ways for reports and dashboards.

Table of contents

  • Introduction
  • What is the Aggregation Framework?
  • Aggregation Pipeline
    • Basic Stages of Aggregation Pipeline
      • $match
      • $group
      • $unwind
      • $project
        • Points to note
      • $sort
      • $sample
      • $limit
      • $lookup
  • Aggregation Process
  • Accumulator Operators
    • $sum
    • $avg
    • $max and $min
    • $push
  • Similarity of the Aggregation Process in MongoDB with SQL
  • Aggregation Pipeline Optimization
    • Projection Optimization
    • Pipeline Sequence Optimization
  • MapReduce in MongoDB
    • MapReduce JavaScript Functions
    • Incremental MapReduce
  • Comparison Between MapReduce and Aggregation Pipeline in MongoDB
  • Summary

Introduction

Using the CRUD find operation while fetching data in MongoDB may sometimes become tedious. For instance, you may want to fetch some embedded documents in a given field but the find operation will always fetch the main document and then it will be upon you to filter this data and select a field with all the embedded documents, scan through it to get ones that match your criteria. Since there is no simple way to do this, you will be forced to use something like a loop to go through all these subdocuments until you get the matching results. However, what if you have a million embedded documents? You will unfortunately get frustrated with how long it will take. Besides, the process will take a lot of your server’s random memory and maybe terminate the process before you get all the documents you wanted, as the server document size may be surpassed.

In this paper, we will deep dive into MongoDB’s Aggregation Framework and look into the different stages of the Aggregation Pipeline. We’ll see how we make use of these stages in an aggregation process.  We’ll then look at the operators that can assist in the analysis process of input documents. Finally, we’ll compare the aggregation process in MongoDB with SQL, as well as the differences between the aggregation process and MapReduce in MongoDB.

Want to read the rest?

Download the full whitepaper for free