Hi Friends, we are back again with the Mongo DB mania.
In our last discussion we had discussed about installation of Mongo DB in your system. Before going to the next level of details we will touch upon some of the background stuffs. In this post we will see some interesting topics covering in which stage Mongo DB has been evolved and why at all we should use this. This is one of the pretty basic question one should ask before learning or using any new technology.
The BIG DATA Scenario:
Let's get a step back to 2006-2007. Industry leaders started facing the impact of rate of data growth, which started increasing rapidly every year. A terabyte of data once a less heard has become pretty common and frequent scenario now-a-days (we are getting 1 TB flash as well). An airline jet collects 10 terabyte of sensor data for every 30 minutes of its flying time. NYSE generates about one terabyte of new trade per day to perform stock trading analytic to determine trends. Facebook users spend 10.5 billion minutes (almost 20,000 years) online on the social network It has an average of 3.2 billion likes and comments are posted every day. Twitter has over 500 million registered users.
As per as the Global Data volume is concerned it has already crossed 2.7 Zeta bytes (1 ZB = 1 Billion Tera bytes, please count the trailing zeros yourself).
With all these statistics is mind, it can proudly be said that storing of this enormous data is NOT a challenge (as cost of storing the data keeps becoming cheaper and cheaper, with the latest invent of semiconductor technology). The problem comes when we started processing using these BIG data. Even though storing capacity is getting higher but IO speed is not increasing much comparatively and that is what becoming a bottleneck.
Challenge with the traditional RDBMS:
In the last 10 years, the Internet has challenged relational databases in ways nobody could have foreseen. First you have a single server with a small data set. Then you find yourself setting up replication so you can scale out reads and deal with potential failures. And, before too long, you’ve added a caching layer,
tuned all the queries, and thrown even more hardware at the problem. Eventually you arrive at the point when you need to shard the data across multiple clusters and rebuild a ton of application logic to deal with it. And soon after that you realize that you’re locked into the schema you modeled so many months before.
Why? Because there’s so much data in your clusters now that altering the schema will take a long time and involve a lot of precious DBA time. It’s easier just to work around it in code. This can keep a small team of developers busy for many months. In the end, you’ll always find yourself wondering if there’s a better way—or why more of these features are not built into the core database server.
Welcome to the NoSQL world:
Keeping all these challenges in mind it was time to come up with an alternative. Mongo DB has been invented by 10 Gen, which is a powerful, flexible, and scalable general-purpose database. It combines
the ability to scale out with features such as secondary indexes, range queries, sorting, aggregations, and geo-spatial indexes.
The easy-to-use features of Mongo DB enables the agile developers to build their application fast with cost effective scaling out capability providing high performance.
Though there are some compromises from features perspective as compared to the relational databases. It doesn't support join and multi-document (if you don't know a document, don't worry, we will cover in the next posts) transaction. This was a well thought design decision as to support performance and scalability prior to those two features. With that said, it is very well guided in order to design your application.
Nice stuff to know!! We will now grow in pace...
<< Prev Next >>
In our last discussion we had discussed about installation of Mongo DB in your system. Before going to the next level of details we will touch upon some of the background stuffs. In this post we will see some interesting topics covering in which stage Mongo DB has been evolved and why at all we should use this. This is one of the pretty basic question one should ask before learning or using any new technology.
The BIG DATA Scenario:
Let's get a step back to 2006-2007. Industry leaders started facing the impact of rate of data growth, which started increasing rapidly every year. A terabyte of data once a less heard has become pretty common and frequent scenario now-a-days (we are getting 1 TB flash as well). An airline jet collects 10 terabyte of sensor data for every 30 minutes of its flying time. NYSE generates about one terabyte of new trade per day to perform stock trading analytic to determine trends. Facebook users spend 10.5 billion minutes (almost 20,000 years) online on the social network It has an average of 3.2 billion likes and comments are posted every day. Twitter has over 500 million registered users.
As per as the Global Data volume is concerned it has already crossed 2.7 Zeta bytes (1 ZB = 1 Billion Tera bytes, please count the trailing zeros yourself).
With all these statistics is mind, it can proudly be said that storing of this enormous data is NOT a challenge (as cost of storing the data keeps becoming cheaper and cheaper, with the latest invent of semiconductor technology). The problem comes when we started processing using these BIG data. Even though storing capacity is getting higher but IO speed is not increasing much comparatively and that is what becoming a bottleneck.
Challenge with the traditional RDBMS:
In the last 10 years, the Internet has challenged relational databases in ways nobody could have foreseen. First you have a single server with a small data set. Then you find yourself setting up replication so you can scale out reads and deal with potential failures. And, before too long, you’ve added a caching layer,
tuned all the queries, and thrown even more hardware at the problem. Eventually you arrive at the point when you need to shard the data across multiple clusters and rebuild a ton of application logic to deal with it. And soon after that you realize that you’re locked into the schema you modeled so many months before.
Why? Because there’s so much data in your clusters now that altering the schema will take a long time and involve a lot of precious DBA time. It’s easier just to work around it in code. This can keep a small team of developers busy for many months. In the end, you’ll always find yourself wondering if there’s a better way—or why more of these features are not built into the core database server.
Welcome to the NoSQL world:
Keeping all these challenges in mind it was time to come up with an alternative. Mongo DB has been invented by 10 Gen, which is a powerful, flexible, and scalable general-purpose database. It combines
the ability to scale out with features such as secondary indexes, range queries, sorting, aggregations, and geo-spatial indexes.
The easy-to-use features of Mongo DB enables the agile developers to build their application fast with cost effective scaling out capability providing high performance.
Though there are some compromises from features perspective as compared to the relational databases. It doesn't support join and multi-document (if you don't know a document, don't worry, we will cover in the next posts) transaction. This was a well thought design decision as to support performance and scalability prior to those two features. With that said, it is very well guided in order to design your application.
Nice stuff to know!! We will now grow in pace...
<< Prev Next >>
Interesting... need more details around NoSQL though :)
ReplyDeleteRehan, ... keep watching more info to come ...
ReplyDeleteGood start. Thanks for sharing this. Waiting for next post.
ReplyDelete