Aggregation Framework

The Aggregation Framework in MongoDB is a powerful tool used to process and transform data in complex ways. It allows you to perform operations on the data that would normally require multiple queries or sophisticated calculations, such as filtering, grouping, sorting, and reshaping documents.

MongoDB’s aggregation framework is similar to SQL’s GROUP BY and JOIN operations but much more flexible and powerful, especially when working with large datasets.

1. Introduction to Aggregation

Aggregation operations process data records and return computed results. MongoDB provides a variety of aggregation operators and stages that can be combined in pipelines to perform complex data transformations.

The core concept of MongoDB aggregation is the aggregation pipeline, which consists of multiple stages that process data in sequence. Each stage in the pipeline transforms the data and passes it on to the next stage.

2. Aggregation Pipeline

An aggregation pipeline is a sequence of stages where each stage transforms the data as it moves through the pipeline. Each stage can take input, transform the documents, and output the result to the next stage.

Syntax for Aggregation:

javascript

Copy code

db.collection.aggregate([ { stage1 }, { stage2 }, { stage3 }, ... ]);

Each stage is an object representing a specific operation, and multiple stages are applied in sequence.

3. Common Aggregation Stages

MongoDB provides a set of operators that can be used to transform data in the pipeline. Here are the most common stages used in the aggregation framework:

$match – Filtering Documents

The $match stage filters documents to pass only those that match the specified condition. This stage is similar to the find() method but is used within the aggregation pipeline.

Example (filter documents where the age is greater than 25):

javascript

Copy code

db.users.aggregate([ { $match: { age: { $gt: 25 } } } ]);

$group – Grouping Documents

The $group stage groups documents by a specified field and performs aggregation operations on grouped data, such as sum, average, min, max, count, etc.

Example (group documents by department and calculate the average age):

javascript

Copy code

db.users.aggregate([ { $group: { _id: "$department", avgAge: { $avg: "$age" } } } ]);

Here, the _id field represents the field by which documents are grouped (in this case, by department), and avgAge is the computed average age for each group.

$sort – Sorting Documents

The $sort stage sorts the documents based on specified fields in ascending (1) or descending (-1) order.

Example (sort users by age in descending order):

javascript

Copy code

db.users.aggregate([ { $sort: { age: -1 } } ]);

$project – Modifying Document Structure

The $project stage reshapes each document in the stream by specifying which fields to include, exclude, or add new fields.

Example (include only name and age fields, and add a new isAdult field based on age):

javascript

Copy code

db.users.aggregate([ { $project: { name: 1, age: 1, isAdult: { $gte: ["$age", 18] } } } ]);

In this example, the isAdult field will be true if age is 18 or greater, and false otherwise.

$limit – Limiting the Number of Documents

The $limit stage limits the number of documents passed to the next stage in the pipeline.

Example (limit the result to the first 5 users):

javascript

Copy code

db.users.aggregate([ { $limit: 5 } ]);

$skip – Skipping Documents

The $skip stage skips a specified number of documents and passes the rest along the pipeline.

Example (skip the first 10 users):

javascript

Copy code

db.users.aggregate([ { $skip: 10 } ]);

$unwind – Deconstructing Arrays

The $unwind stage deconstructs an array field from the documents into multiple documents, one for each element of the array.

Example (unwind the tags array into separate documents):

javascript

Copy code

db.users.aggregate([ { $unwind: "$tags" } ]);

This will create a separate document for each element in the tags array.

$lookup – Joining Collections

The $lookup stage allows you to perform a left outer join between two collections, similar to SQL joins. It combines documents from two collections based on a common field.

Example (join users with orders collection on userId):

javascript

Copy code

db.users.aggregate([ { $lookup: { from: "orders", // The collection to join localField: "userId", // The field in the `users` collection foreignField: "userId", // The field in the `orders` collection as: "orderDetails" // The field name to store the joined documents } } ]);

This will add an orderDetails array in each user document, containing the matched orders.

$addFields – Adding New Fields

The $addFields stage adds new fields to documents or modifies existing fields.

Example (add a fullName field by combining firstName and lastName):

javascript

Copy code

db.users.aggregate([ { $addFields: { fullName: { $concat: ["$firstName", " ", "$lastName"] } } } ]);

This will create a new field fullName that concatenates firstName and lastName.

$count – Counting Documents

The $count stage counts the number of documents that pass through the pipeline.

Example (count the total number of users):

javascript

Copy code

db.users.aggregate([ { $count: "totalUsers" } ]);

This will return the total number of users as a single document with a field totalUsers.

4. Complex Aggregation Example

Here’s an example of a more complex aggregation pipeline that combines several stages:

javascript

Copy code

db.orders.aggregate([ { $match: { status: "shipped" } }, // Filter orders that have been shipped { $group: { _id: "$userId", totalAmount: { $sum: "$amount" } } }, // Group by userId and sum up the total amount { $sort: { totalAmount: -1 } }, // Sort by totalAmount in descending order { $limit: 5 } // Limit to the top 5 users ]);

This pipeline:

Filters orders where status is "shipped".
Groups orders by userId and calculates the total order amount for each user.
Sorts the results by totalAmount in descending order.
Limits the results to the top 5 users.

5. Aggregation Operators

MongoDB provides a variety of aggregation operators that can be used inside stages like $group, $project, $match, and $addFields. Here are some common ones:

Mathematical Operators

$sum: Sums up values.
$avg: Computes the average of values.
$min: Finds the minimum value.
$max: Finds the maximum value.

Example (calculate the total, average, and maximum order amount):

javascript

Copy code

db.orders.aggregate([ { $group: { _id: null, total: { $sum: "$amount" }, avg: { $avg: "$amount" }, max: { $max: "$amount" } } } ]);

Array Operators

$push: Adds all values to an array.
$addToSet: Adds unique values to an array.
$first and $last: Get the first and last value of a group.

Example (get a list of distinct tags for each user):

javascript

Copy code

db.users.aggregate([ { $group: { _id: "$userId", tags: { $addToSet: "$tags" } } } ]);

6. Performance Considerations

While the aggregation framework is very powerful, it’s important to consider performance when using it on large datasets:

Indexes: Ensure that fields used in $match, $sort, and $lookup are indexed for optimal performance.
Memory: Aggregation queries can use a significant amount of memory, especially when dealing with large datasets. The allowDiskUse option can be used to enable disk-based processing when the memory limit is exceeded:javascriptCopy codedb.orders.aggregate([{ $match: { status: "shipped" } }], { allowDiskUse: true });
Use $project early: To reduce the size of the documents passed between stages, use $project to remove unnecessary fields as early as possible in the pipeline.

Conclusion

The MongoDB Aggregation Framework is a powerful tool for performing complex queries, data transformations, and analytics within the database. By using a combination of operators and pipeline stages, you can filter, group, reshape, and summarize your data in many useful ways.

MongoDB – (No-SQL)

Curriculum

Aggregation Framework

1. Introduction to Aggregation

2. Aggregation Pipeline

Syntax for Aggregation:

3. Common Aggregation Stages

$match – Filtering Documents

$group – Grouping Documents

$sort – Sorting Documents

$project – Modifying Document Structure

$limit – Limiting the Number of Documents

$skip – Skipping Documents

$unwind – Deconstructing Arrays

$lookup – Joining Collections

$addFields – Adding New Fields

$count – Counting Documents

4. Complex Aggregation Example

5. Aggregation Operators

Mathematical Operators

Array Operators

6. Performance Considerations

Conclusion