Sunday 11 January 2015

Data Model Design in MongoDB

The main USP of MongoDB data model is it's flexibility and dynamic schema. MongoDB doesn't enforce any rigid schema. Moreover saving any kind of structure of data is very easy.

If you are coming from a traditional database background you might be aware of the steps in case we need to enhance/change some functionality that might need to change the transactional data model. You need to execute the corresponding DDL (data definition language) script (may be you need a DBA for this, as most organizations does have a process/protocol to do any change in the database layer) before you do any change in the application layer. This increases the development time and in turn go-to-market is highly impacted.

Think of a situation, where you don't need to run any alter script at all and you can implement the required functionality from application layer itself. The development would be much faster. The dynamic behavior of MongoDB data model helps to a greater extent.

As we have discussed in our earlier posts that in order to achieve the performance benefit MongoDB doesn't support the following:

- Full ACID transaction across multiple documents or collections
- Joins across multiple documents or multiple collections
- Foreign key concept (like RDBMS)

While starting doing a data model design apparently it looks like it is not practical to have a persistence architecture which doesn't support full ACID transaction. If we don't have joins supported how can we query across multiple collections (i.e. table in RDBMS). At last, not having any foreign key integrity might end up with inconsistency issue. Weird!!!

How MongoDB tackles the limitation:

According to CAP theorem, for any distributed architecture it is impossible to support all 3 properties i.e. consistency, availability and partition tolerance, at the same time. You can read this in detail here.

MongoDB supports BASE (Basically Available with Soft State and Eventual Consistency) instead of ACID. The above blog will give you idea on BASE. Also it supports atomic write operation on a single document. This will lead to think of a lateral approach of the designing. We need to think how the entity relationship can be represented so that write operations should not have multiple collections involved. An approach to achieve this is representing entity relationships in terms of Embedded Documents. Embedding a child document in a parent is nothing but implementing pre-join at the time of write.

Example is better than definition:

Say, there is an HRMS application where we have two entities Person and Address. A person can have multiple addresses i.e. the relationship between this two is 1-to-n. If we need to design this model in RDBMS we need two tables - one is for person and another is for address (see the diagram below)


Fig1: Data model in RDBMS


So when you need to insert or update any person info you have to access two tables. In MongoDB this model should be designed embedding the address document in the person document and having only one collection (see below)


Fig 2: Data Model in MongoDB


According to the above design for any write operation we need to access only one collection. The whole information for a person is part of one single document and we don't need any foreign key constraint as well. This way we can handle all 3 limitations mentioned.

This is a paradigm shift of the data model design. The performance benefit fully depends on the way you design.

Design Considerations

Before starting the design check out the following:

- User requirements on the frequently used functionalities
- Data volumes and growth
- Can you live without having full ACID

In the next post we will take a real life use case and try to design the same in MongoDB.


<< Prev                                                                                     Next >>

No comments:

Post a Comment