Map-Reduce using Aggregation Pipeline in MongoDB

Samkit Shah
3 min readMay 11, 2021

--

An aggregation pipeline in mongoDB provides better performance and usability than a Map-Reduce operation.

  1. Create a collection named orders and inserted the rows or documents.
db.orders.insertMany([
{ _id: 1, cust_id: “Ant O. Knee”, ord_date: new Date(“2020–03–01”), price: 25, items: [ { sku: “oranges”, qty: 5, price: 2.5 }, { sku: “apples”, qty: 5, price: 2.5 } ], status: “A” },
{ _id: 2, cust_id: “Ant O. Knee”, ord_date: new Date(“2020–03–08”), price: 70, items: [ { sku: “oranges”, qty: 8, price: 2.5 }, { sku: “chocolates”, qty: 5, price: 10 } ], status: “A” },
{ _id: 3, cust_id: “Busby Bee”, ord_date: new Date(“2020–03–08”), price: 50, items: [ { sku: “oranges”, qty: 10, price: 2.5 }, { sku: “pears”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 4, cust_id: “Busby Bee”, ord_date: new Date(“2020–03–18”), price: 25, items: [ { sku: “oranges”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 5, cust_id: “Busby Bee”, ord_date: new Date(“2020–03–19”), price: 50, items: [ { sku: “chocolates”, qty: 5, price: 10 } ], status: “A”},
{ _id: 6, cust_id: “Cam Elot”, ord_date: new Date(“2020–03–19”), price: 35, items: [ { sku: “carrots”, qty: 10, price: 1.0 }, { sku: “apples”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 7, cust_id: “Cam Elot”, ord_date: new Date(“2020–03–20”), price: 25, items: [ { sku: “oranges”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 8, cust_id: “Don Quis”, ord_date: new Date(“2020–03–20”), price: 75, items: [ { sku: “chocolates”, qty: 5, price: 10 }, { sku: “apples”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 9, cust_id: “Don Quis”, ord_date: new Date(“2020–03–20”), price: 55, items: [ { sku: “carrots”, qty: 5, price: 1.0 }, { sku: “apples”, qty: 10, price: 2.5 }, { sku: “oranges”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 10, cust_id: “Don Quis”, ord_date: new Date(“2020–03–23”), price: 25, items: [ { sku: “oranges”, qty: 10, price: 2.5 } ], status: “A” }
])

Now to perform the map-reduce operation let's say to get the total sum of the prices separated by customoer_id for that we will create a Mapper function which will take input from the document.

Mapper Function

var mapperFunction = function() {   emit(this.cust_id, this.price);};

Now we’ll create Reduce function that will perform the reducer operation which could be sum,avg etc.

Reducer Function

var reducerFunction = function(keyCustId, valuesPrices) {   return Array.sum(valuesPrices);};

and that’s it.

Now we’ll load the output in a collection name mr_output. If the mr_output collection already exists, the operation will replace the contents with the results of this map-reduce operation.

db.orders.mapReduce(   mapperFunction,   reducerFunction,   { out: 
"mr_output" })

the db.collection_name.mapReduce() method is a wrapper around the mapreduce command.

To check if map-reduce operation is a sucess or not we can run the query mention below:

db.map_reduce_example.find().sort( { _id: 1 } )

Output

I hope you got an insight.

--

--

Samkit Shah

Machine Learning | Deep Learning | DevOps | MLOps | Cloud Computing | BigData