Wednesday, 21 January 2015

Design an eCart Use Case in MongoDB

Continued from our previous post we will try to enhance the same online retail application. In this post we will discuss how we can implement checkout functionality. The popular use case for an e-Cart application is 

- add product into a cart
- remove product from a cart

In our last post we have seen how to design 1-to-1 and 1-to-n relationship in MongoDB. The entity diagram shows that the product and cart will have an n-to-n relationship between them. In case of n-to-n we need have 2 collections to represent the same. In this case we need to have product and cart collections. See below:

Product Collection:

{
    _id: ObjectId("54794c28d5bf15daee4cb42b"),
    itemNumber: "12345678",

    ......
    carted: [{cart_id : ObjectId("54794c28d5bf15daee4cb42c"),
                 qtyPurchased : 1,
                 ts: new Date()}],
    inventory: {
        totalQty: 500,

        .....
    }
    ......
}

Cart Collection:

{
    _id : ObjectId("54794c28d5bf15daee4cb42c"),
    last_update : new Date(),
    status : "active",
    items : [
        {
itemNumber : "123456", qty : 1},
        {
itemNumber : "456789", qty : 2}
    ]
}


Let's analyze. The carted array needs to be maintained in the product collection to support the inventory management system. At the same time the cart collection should have the product detail to support the cart (checkout) pages. The inventory management module need to know how many products are available in the inventory as well as how many of them are promised to be delivered to the customer. For the same we should not have to access cart collection (should not access multiple collections as we join is not supported).
Similarly, to show the cart pages we should not have to access product collection, as all the information can be retrieved from the cart collection only.

To achieve this, the "add to cart" and "remove from the cart" functionality should modify data in these 2 collections in order to make the data consistent. The carted and inventory object should be updated from the product collection whereas the items array needs to be updated from the cart collection. 

Following could be the algorithm:

Add-to-cart algorithm

{
 1. Check if the product is available in the inventory (quantity available > 0)
 2. If yes, decrement the total quantity of the product by the quantity purchased from product collection
 3. Get the current availability of the product. This is required to show "product in stock" on the product detail page
 4. Update the cart collection with the product and purchased quantity detail
}

Oversell : Challenges in a Multi threaded situation

At a high level this looks to be a traditional implementation by having few consecutive database operations. The challenge will occur if multiple threads try to execute the same database operation on the same document simultaneously. It is of high possiblity that two concurrent threads are fighting for a race condition. Both of the concurrent threads will get true out of the step 1 if the available quantity is just 1. And both of them will try to decrement the quantity (step 2) from the inventory. This will result in negative quantity in the inventory i.e. the application is promising the delivery for a product which is not present at all. This is called "Overselling" in the e-commerce space and this is a problem with many online e-commerce application.

findAndModify: How to overcome this issue in MongoDB

MongoDB provides an intelligent way to overcome this problem. If we think little dipper, we will understand that the main issue is due to these 4 steps being part of separate atomic transaction. The first 3 steps are on the same collection (product) and step 4 is on the cart collection. MongoDB provides a method called findAndModify to perform the 1st 3 steps in one atomic transaction. We can find the availability and modify the available quantity of the product in one single transaction only. So the updated algorithm will be :

{
 1. Check the product availability, decrement the total quantity of the product by the quantity purchased and returns the available quantity after purchase
 2. Update the cart collection with the product and purchased quantity detail
}
The following code (we have implemented in node.js) snippet implements the above logic:

var product = db.collection("product"); // get the product collection
var cart = db.collection("cart"); // get the product collection


this.addToCart = function(itemNumber, qtyPurchased, cartId, callback) {       
        qtyPurchased = parseInt(qtyPurchased, 10);
        var carted = {"cartId" : cartId, "qty" : qtyPurchased, "ts" : new Date()};
        var qrytoUpdate = {"itemNumber" : itemNumber, "inventory.totalQty" : {$gte : qtyPurchased}};
        var ops = {"new" : true};
        var toUpdate = {$inc : {"inventory.totalQty" : -qtyPurchased}, $set : {"inventory.lastUpdatedOn" : new Date()} , $push : {"carted" : carted}};


        // update the inventory - decrease the quantity
        product.findAndModify(qrytoUpdate, {}, toUpdate, ops, function(err, result){
            if (err) {
                console.warn(err.message); // returns error if no matching object found
                return callback(err, null);
            } else if(result){
                var qry = {"_id": cartId, "status" : "A"};
                var upd = {$set : {"lastModifiedOn" : new Date()}, $push : {"items" : {"itemNumber" : itemNumber, "qty" : qtyPurchased}}};
                ops = {"upsert" : true};
               
                // on success - update the cart - increase the quantity
                cart.update(qry, upd, ops, function(err, res){                   
                    if (err) {
                        console.warn(err.message);
                       
                        // for error - roll back the inventory update
                        toUpdate = {$inc : {"inventory.totalQty" : qtyPurchased}, $set : {"inventory.lastUpdatedOn" : new Date()} , $pull : {"carted" : carted}};
                        qrytoUpdate = {"itemNumber" : itemNumber};                       
                        product.update(qrytoUpdate, toUpdate, function(err, rollbackres){
                            if (err) {
                                console.warn(err.message); // returns error if no matching object found
                                return callback(err, null);
                            } else {
                                err = {"err" : "Item added back to the inventory"};
                                console.log("Item added back to the inventory - "+cartId);
                                callback(err, rollbackres);
                            }
                        });                       
                    } else {
                        console.log("Item Left in inventory - "+result.inventory.totalQty);
                        callback(err, result);
                    }                   
                });
            } else {
                console.log("Item not available in inventory for cart id - "+cartId);
                err = {"err" : "No Item Available"};
                callback(err, null);
            }           
        });       
    };


The full code base is available in the git. Please check it out.

The performance testing shows this block will ensure the quantity available never reaches to a negative value. 

Pessimistic Locking: An Alternate Approach

Many people implements pessimistic locking in order to address the same problem. Where we have to follow the below algorithm:
{
 1. Manually lock the current document by inserting the _id value in a lock collection (e.g. product.lock - custom collection).
 2. Access and modify the product collection (inventory values) only if the specific document is not present in the product.lock collection; thus trying to make sure multiple concurrent threads are not accessing the same document.
3. Once done, manually remove the document from lock collection, making it free for the next threads to work on the same.
}

This approach will have the same issue, as 1st step itself is not thread safe. Multiple concurrent threads can lock the same document in step 1, which will end up with deadlock condition.

findAndModify() is a better approach as opposed to implement pessimistic locking.


<< Prev                                                                                     Next >>
 

Tuesday, 13 January 2015

Designing Product Catalog (e-Commerce) in MongoDB

In this post we will discuss a real life use case of e-commerce specific domain. We will try to design a product catalog data model in MongoDB.

The basic entities of a product catalog building application can be depicted as follows:
- Product
- Product Attributes
- Price
- Ratings and Review
- Cart
- Inventory
- Store 

There can be many more, for our example we are considering the above only.

Following diagram depicts the relationships among the entities:



To start the design first we need to consider user requirements which will be accessed most frequently. In an e-commerce web site the most frequent functionality to be used are:

- Browse Product Detail Page by category
- Search products by name or keyword
- Sort by price (low to high)

As product be the main entity we would model the data around a product. The 1-to-1 relationship can be represented by embedding the child document inside the parent while the 1-to-n relationship can be represented as an embedded array of multiple sub-documents. A suggested design would be as follows:

{
  _id: ObjectId("54794c28d5bf15daee4cb42b"),
  partNumber: "12345678",

  type: "SA",
  category: "Electronics",
  path: "Electronics/iPad and Tablets/iPad",

  name: "Apple® iPad Air 32GB Wi-Fi - Silver/White (MD789LL/A)",
  description: "The iPad Air from Apple is 20 percent thinner than the 4th generation iPad and weighs just one pound, so it feels unbelievably light in your hand. It comes with a 9.7-inch Retina display, the powerful A7 chip with 64-bit architecture, ultrafast wireless, powerful apps, and up to 10 hours of battery life",
  .....
  priceInfo: [
   {
     type: "list",
     startDate: ISODate("2012-03-09T20:55:36Z"),
     endDate: ISODate("2012-03-31T20:55:36Z"),
     price: 10,
     unit: "USD"
   },
   ....
  ],
  ....
  carted: [],
  inventory: {
     totalQty: 500,
     lastUpdatedOn: null,
     store: []
  }
  details: {
    maxResolution: "2048 x 1536",
    displayFeatures: "Retina Display, Backlit LED",
    computerFeatures: "Touch Screen, Built-in Speaker, Integrated Microphone, Built-In Bluetooth",
    processorBrand: "Apple",
    ....
  },

  reviews: []
}


You can see the full data model in Git. Click here.  

The above data model can address atomic write operation issues. While browsing thru product info user can access one one single collections. Also write operations should be touch more than one collection.

To improve the read performance of the functionality for fetch product details by category we can create index on the category field as below.

db.product.ensureIndex({category : 1})

The search functionality (by name and detail) can be enhanced by creating text index on those fields as shown below:

db.product.ensureIndex({name : "text"})
db.product.ensureIndex({description : "text"})

The "Sort by Price" feature can be implemented by creating an index on the price field in a descending manner. This should be on the list price only. See below:

db.product.ensureIndex({"priceInfo.0" : -1})


The carted field can be used for add to cart functionality. We can see this in the next post.


<< Prev                                                                                     Next >>

Sunday, 11 January 2015

Data Model Design in MongoDB

The main USP of MongoDB data model is it's flexibility and dynamic schema. MongoDB doesn't enforce any rigid schema. Moreover saving any kind of structure of data is very easy.

If you are coming from a traditional database background you might be aware of the steps in case we need to enhance/change some functionality that might need to change the transactional data model. You need to execute the corresponding DDL (data definition language) script (may be you need a DBA for this, as most organizations does have a process/protocol to do any change in the database layer) before you do any change in the application layer. This increases the development time and in turn go-to-market is highly impacted.

Think of a situation, where you don't need to run any alter script at all and you can implement the required functionality from application layer itself. The development would be much faster. The dynamic behavior of MongoDB data model helps to a greater extent.

As we have discussed in our earlier posts that in order to achieve the performance benefit MongoDB doesn't support the following:

- Full ACID transaction across multiple documents or collections
- Joins across multiple documents or multiple collections
- Foreign key concept (like RDBMS)

While starting doing a data model design apparently it looks like it is not practical to have a persistence architecture which doesn't support full ACID transaction. If we don't have joins supported how can we query across multiple collections (i.e. table in RDBMS). At last, not having any foreign key integrity might end up with inconsistency issue. Weird!!!

How MongoDB tackles the limitation:

According to CAP theorem, for any distributed architecture it is impossible to support all 3 properties i.e. consistency, availability and partition tolerance, at the same time. You can read this in detail here.

MongoDB supports BASE (Basically Available with Soft State and Eventual Consistency) instead of ACID. The above blog will give you idea on BASE. Also it supports atomic write operation on a single document. This will lead to think of a lateral approach of the designing. We need to think how the entity relationship can be represented so that write operations should not have multiple collections involved. An approach to achieve this is representing entity relationships in terms of Embedded Documents. Embedding a child document in a parent is nothing but implementing pre-join at the time of write.

Example is better than definition:

Say, there is an HRMS application where we have two entities Person and Address. A person can have multiple addresses i.e. the relationship between this two is 1-to-n. If we need to design this model in RDBMS we need two tables - one is for person and another is for address (see the diagram below)


Fig1: Data model in RDBMS


So when you need to insert or update any person info you have to access two tables. In MongoDB this model should be designed embedding the address document in the person document and having only one collection (see below)


Fig 2: Data Model in MongoDB


According to the above design for any write operation we need to access only one collection. The whole information for a person is part of one single document and we don't need any foreign key constraint as well. This way we can handle all 3 limitations mentioned.

This is a paradigm shift of the data model design. The performance benefit fully depends on the way you design.

Design Considerations

Before starting the design check out the following:

- User requirements on the frequently used functionalities
- Data volumes and growth
- Can you live without having full ACID

In the next post we will take a real life use case and try to design the same in MongoDB.


<< Prev                                                                                     Next >>