Sunday, 19 January 2014

A Deep Dive to Achieve Better Performance and Scalability

Welcome back friends.

First let me thank you all for the over whelming (unexpected too) responses after the last post. We have received multiple responses from multiple channels.

The main goal for this discussion would be to elaborate the point of having much better performance and ease of scalability of the NoSql technologies over the traditional RDBMS.

Distributed File System Architecture:

The KEY lies in the distributed file system architecture and parallel processing of all the BIG data solutions. Let's discuss this with an example. I will ask you to do some math here.

Let say there is a data of 1 TB we need to read from a disk. The disk has 4 I/O channels each of which is having 100 MB/sec I/O speed. Your assignment is to calculate what time it would take to read the whole 1 TB of data using those 4 I/O channels. (no scroll please before you calculate .. :))



Time to read through one channel is 
t = (1000000000000 / 100 x 1000000)  = 10000 sec

Time to read through 4 channels is 
t = 10000/4 = 2500 sec = 41.66 min

Pretty simple right!!!

Now let's distribute this data into 10 different chunks into 10 different disks with similar configuration as earlier. What would be the total time to read the 1 TB data this time?



t = 2500 / 10 = 250 sec = 4.16 min

A straightaway advantage of 10 times in performance, cool !!! (though in real life this won't be a straight math; there would be some additional time required due to the network latency; but there would be advantage).

In Real Time Scenario:

You might think what is the big deal of having the distributed architecture in case of the traditional RDBMS and why is that we need to go for a totally new solution. That is the whole essence.

To build up a system highly scalable and cost-effective at the same time, we have two approaches:
- Vertical Scalability (scaling up)
- Horizontal Scalability (scaling out)

Vertical scalability means upgrading the resources of the same system (like RAM, processor or more disk space etc.). This is not a cost effective solution as high end servers are costly and at one point of time this would become out of reach in practical.

Horizontal scalability means adding up more resources in cluster for parallel processing. With Mongo DB, scaling out has become very easy (easily configurable) and one can scale out their existing database with low-cost commodity hardware. To start with a single node cluster can be configured. With the growing of its data volume more servers can be added into the cluster (this technique is called sharding in Mongo DB terminology) without affecting the application development and with zero down time.

The other aspect of Mongo DB is its variety of data to be supported. By saying variety of data we mean structured, semi-structured as well as unstructured data can very well be laid out in the design.

In our next discussion we will talk about some of the important features of Mongo DB and what value those features bring in as compared to its counterpart.

See you there ...

<< Prev                                                                                     Next >>

2 comments:

  1. Superrrrr.but I would like to know as in how does it store and maintain ACID properties.

    ReplyDelete
    Replies
    1. Hi Gloria, it doesn't support full ACID as I mentioned in my last discussion. Having said that, there are ways to handle this in MongoDB and we are going to discuss in our future posts... please keep watching ...

      Delete