The Aggregation Framework in MongoDB is a powerful tool used to process and transform data in complex ways. It allows you to perform operations on the data that would normally require multiple queries or sophisticated calculations, such as filtering, grouping, sorting, and reshaping documents.
MongoDB’s aggregation framework is similar to SQL’s GROUP BY
and JOIN
operations but much more flexible and powerful, especially when working with large datasets.
Aggregation operations process data records and return computed results. MongoDB provides a variety of aggregation operators and stages that can be combined in pipelines to perform complex data transformations.
The core concept of MongoDB aggregation is the aggregation pipeline, which consists of multiple stages that process data in sequence. Each stage in the pipeline transforms the data and passes it on to the next stage.
An aggregation pipeline is a sequence of stages where each stage transforms the data as it moves through the pipeline. Each stage can take input, transform the documents, and output the result to the next stage.
javascript
Copy code
db.collection.aggregate([ { stage1 }, { stage2 }, { stage3 }, ... ]);
Each stage is an object representing a specific operation, and multiple stages are applied in sequence.
MongoDB provides a set of operators that can be used to transform data in the pipeline. Here are the most common stages used in the aggregation framework:
The $match
stage filters documents to pass only those that match the specified condition. This stage is similar to the find()
method but is used within the aggregation pipeline.
Example (filter documents where the age
is greater than 25):
javascript
Copy code
db.users.aggregate([ { $match: { age: { $gt: 25 } } } ]);
The $group
stage groups documents by a specified field and performs aggregation operations on grouped data, such as sum
, average
, min
, max
, count
, etc.
Example (group documents by department
and calculate the average age
):
javascript
Copy code
db.users.aggregate([ { $group: { _id: "$department", avgAge: { $avg: "$age" } } } ]);
Here, the _id
field represents the field by which documents are grouped (in this case, by department
), and avgAge
is the computed average age for each group.
The $sort
stage sorts the documents based on specified fields in ascending (1
) or descending (-1
) order.
Example (sort users by age
in descending order):
javascript
Copy code
db.users.aggregate([ { $sort: { age: -1 } } ]);
The $project
stage reshapes each document in the stream by specifying which fields to include, exclude, or add new fields.
Example (include only name
and age
fields, and add a new isAdult
field based on age):
javascript
Copy code
db.users.aggregate([ { $project: { name: 1, age: 1, isAdult: { $gte: ["$age", 18] } } } ]);
In this example, the isAdult
field will be true
if age
is 18 or greater, and false
otherwise.
The $limit
stage limits the number of documents passed to the next stage in the pipeline.
Example (limit the result to the first 5 users):
javascript
Copy code
db.users.aggregate([ { $limit: 5 } ]);
The $skip
stage skips a specified number of documents and passes the rest along the pipeline.
Example (skip the first 10 users):
javascript
Copy code
db.users.aggregate([ { $skip: 10 } ]);
The $unwind
stage deconstructs an array field from the documents into multiple documents, one for each element of the array.
Example (unwind the tags
array into separate documents):
javascript
Copy code
db.users.aggregate([ { $unwind: "$tags" } ]);
This will create a separate document for each element in the tags
array.
The $lookup
stage allows you to perform a left outer join between two collections, similar to SQL joins. It combines documents from two collections based on a common field.
Example (join users
with orders
collection on userId
):
javascript
Copy code
db.users.aggregate([ { $lookup: { from: "orders", // The collection to join localField: "userId", // The field in the `users` collection foreignField: "userId", // The field in the `orders` collection as: "orderDetails" // The field name to store the joined documents } } ]);
This will add an orderDetails
array in each user
document, containing the matched orders
.
The $addFields
stage adds new fields to documents or modifies existing fields.
Example (add a fullName
field by combining firstName
and lastName
):
javascript
Copy code
db.users.aggregate([ { $addFields: { fullName: { $concat: ["$firstName", " ", "$lastName"] } } } ]);
This will create a new field fullName
that concatenates firstName
and lastName
.
The $count
stage counts the number of documents that pass through the pipeline.
Example (count the total number of users):
javascript
Copy code
db.users.aggregate([ { $count: "totalUsers" } ]);
This will return the total number of users as a single document with a field totalUsers
.
Here’s an example of a more complex aggregation pipeline that combines several stages:
javascript
Copy code
db.orders.aggregate([ { $match: { status: "shipped" } }, // Filter orders that have been shipped { $group: { _id: "$userId", totalAmount: { $sum: "$amount" } } }, // Group by userId and sum up the total amount { $sort: { totalAmount: -1 } }, // Sort by totalAmount in descending order { $limit: 5 } // Limit to the top 5 users ]);
This pipeline:
status
is "shipped"
.userId
and calculates the total order amount for each user.totalAmount
in descending order.MongoDB provides a variety of aggregation operators that can be used inside stages like $group
, $project
, $match
, and $addFields
. Here are some common ones:
$sum
: Sums up values.$avg
: Computes the average of values.$min
: Finds the minimum value.$max
: Finds the maximum value.Example (calculate the total, average, and maximum order amount):
javascript
Copy code
db.orders.aggregate([ { $group: { _id: null, total: { $sum: "$amount" }, avg: { $avg: "$amount" }, max: { $max: "$amount" } } } ]);
$push
: Adds all values to an array.$addToSet
: Adds unique values to an array.$first
and $last
: Get the first and last value of a group.Example (get a list of distinct tags
for each user):
javascript
Copy code
db.users.aggregate([ { $group: { _id: "$userId", tags: { $addToSet: "$tags" } } } ]);
While the aggregation framework is very powerful, it’s important to consider performance when using it on large datasets:
$match
, $sort
, and $lookup
are indexed for optimal performance.allowDiskUse
option can be used to enable disk-based processing when the memory limit is exceeded:javascriptCopy codedb.orders.aggregate([{ $match: { status: "shipped" } }], { allowDiskUse: true });
$project
early: To reduce the size of the documents passed between stages, use $project
to remove unnecessary fields as early as possible in the pipeline.The MongoDB Aggregation Framework is a powerful tool for performing complex queries, data transformations, and analytics within the database. By using a combination of operators and pipeline stages, you can filter, group, reshape, and summarize your data in many useful ways.