INVENTRIZ...: 2014

Sunday 10 August 2014

Global Variable in MongoDB using .mongorc.js

Today we will see something interesting. Every time we login to our mongod process using the mongo client, it gives us a blank prompt like below:

MongoDB shell version: 2.6.3
connecting to: test

This might sound little boring. To change the same we can make use of the prompt function variable of MongoDB. For example if you want to show your name as the prompt you can simply set up the prompt variable like below:

MongoDB shell version: 2.6.3
connecting to: test

> var prompt = function() { return "Tridib using MongoDB >" };

Once you hit enter you will see the below prompt.

Tridib using MongoDB >

This might be a temporary approach as the scope of the prompt variable set in this mongo shell is limited to this shell only. Next time when you login it will be reset to blank.

To avoid this you can set up this at a global level. Go to your home directory C:\Users\<USERNAME> for windows or $HOME for Unix/Linux. You will see a file called .mongorc.js. Open this file in an editor and add the following content (don't forget to save the file):

var host = db.serverStatus().host;
var prompt = function() { return db+"@"+host+"> "; }

This will prompt the current database you have logged in and the host name of your system. You can change the prompt function of your own choice as well.

test@SOMEHOST>

The file .mongorc.js file is used to set up global function or variables you want to set up, every time the mongo client loads.

<< Prev Next >>

Friday 4 July 2014

Use an External Editor in the mongo Shell

Hi friends, I know the break was little longer. Got stuck with lot of different stuff actually. But better late than never.

I will resume our MongoDB mania with an interesting topic today.

We have been using the mongo shell for the purpose of writing/editing the operations/functions etc. That might be difficult sometime when we have to write a bigger function and execute the same. Think of a situation when you have written down a big function (of 5 lines say) and found that there is a syntax error in line 2, it is very difficult to go back and edit the same through mongo shell.

One option may be write the function in an external .js file and load it into the mongo shell as discussed here.

The other option is also interesting. We can open an external editor (of our choice) from the mongo shell itself and edit some existing function or script.

In mongo shell you can use edit option to edit an existing code/fucntion. The edit command of the shell use the EDITOR variable to open the function in the specified editor.

Open the command prompt and set the EDITOR variable as follows (if it is not set already).

D:\mongodb-win32-x86_64-2008plus-2.4.9\bin>set EDITOR=%windir%\system32\notepad.exe

D:\mongodb-win32-x86_64-2008plus-2.4.9\bin>echo %EDITOR%
C:\Windows\system32\notepad.exe

Now start your mongod in a different command window. Come back to the previous command prompt (where we have set the EDITOR variable) and connect to the mongod using mongo shell.

D:\mongodb-win32-x86_64-2008plus-2.4.9\bin>mongo.exe
MongoDB shell version: 2.4.9
connecting to: test
>

Try writing some new function:

> function myfunc() {}
>

Check the content of the function by tying the function name on the shell:

> myfunc
function myfunc() {}
>

Now we will edit this function in the external editor (notepad in this case). Type the edit command as follows:

> edit myfunc

You will see the function myfunc() will open in a notepad window.

Now edit the function as per your requirement. See below:

After you complete, save the changes (not save-as) and close the editor window (don't forget to save). You will see the prompt will be back on the mongo shell.

Now to check the content, type the function name on the shell:

> myfunc
function myfunc() {
print("the function has been edited using an external editor");
}
>

Run the function on the shell as follows:

> myfunc()
the function has been edited using an external editor
>

Cool !!! right. Enjoy using the external editor for your need.

<< Prev Next >>

Sunday 6 April 2014

Run a javascript through mongo shell

Hi, we will discuss an interesting topic today. We will see how to pass and run a javascript to mongo shell. Let's create a simple js file as follows:

// script1.js
var str = "shell";
print("This is script1.js");
print("The value of the variable str is "+str);
Save the file as script1.js to any location in your system. Now we will run this js file in the mongo shell. Navigate to the bin directory of your mongo installation folder (this is required if the directory is not declared in your path variable). Type the following command:

mongo.exe --nodb d:\script1.js

Here we simply ran the script without connecting to any mongod process (because we don't require in this case). Simple right!!!

In case you want to run multiple script in sequence, simply pass those script location in sequence as follows:

The shell will run those scripts and exit. You can use --quiet option to avoid printing the mongodb shell version.

There is one more way to run the script using the load function. Open the mongo shell and use the following command:

> load("D:\\script1.js")
This is script1.js
The value of the variable str is shell
true
>

Scripts have access to the db variable (as well as any other globals). However, shell helpers such as "use db" or "show collections" do not work from files. There are valid Java‐Script equivalents to each of these, as shown below:

use foo ---> db.getSisterDB("foo")
show dbs ---> db.getMongo().getDBs()
show collections ---> db.getCollectionNames()

You can also use scripts to inject variables into the shell. For example, we could have a script that simply initializes helper functions that you commonly use. The script below, for instance, may be helpful for the replication and sharding configuration. It defines a function, connectTo(), that connects to the locally-running database on the given port and sets db to that connection:

// defineConnectTo.js
/**
* Connect to a database and set db.
*/var connectTo = function(port, dbname) {
if (!port) {
port = 27017;
}
if (!dbname) {
dbname = "test";
}
db = connect("localhost:"+port+"/"+dbname);
return db;
};

If we load this script in the shell, connectTo is now defined:

> typeof connectTo
undefined
> load('defineConnectTo.js')
> typeof connectTo
function

In addition to adding helper functions, you can use scripts to automate common tasks and administrative activities. By default, the shell will look in the directory that you started the shell in (use run("pwd") to see what directory that is). If the script is not in your current directory, you can give the shell a relative or absolute path to it.

For example, if you wanted to put your shell scripts in ~/my-scripts, you could load defineConnectTo.js with load("/home/myUser/my-scripts/defineConnectTo.js"). Note that load cannot resolve ~. You can use run() to run command-line programs from the shell. Pass arguments to the function as parameters:

> run("ls", "-l", "/home/myUser/my-scripts/")
sh70352| -rw-r--r-- 1 myUser myUser 2012-12-13 13:15 defineConnectTo.js
sh70532| -rw-r--r-- 1 myUser myUser 2013-02-22 15:10 script1.js
sh70532| -rw-r--r-- 1 myUser myUser 2013-02-22 15:12 script2.js
sh70532| -rw-r--r-- 1 myUser myUser 2013-02-22 15:13 script3.js

This is of limited use, generally, as the output is formatted oddly and it doesn’t support pipes.

<< Prev Next >>

Tuesday 25 March 2014

Magic with Mongo Shell

Today will see some tips and interesting stuffs with mongo shell. The shell is not only the means to access the mongo database (mongod) but also is a javascript editor.

You can perform various javascript operation. For e.g. you can perform arithmetic operation:

> var i = 10;
> i
10
> var j = 1;
> j
1
> var k = i + j;
> k
11
>

The scope of the variable i,j and k will be valid until you close this shell.

Connecting to a specific db

Generally if you open the shell without any argument, by default it connects to the mongod process running in the same system (localhost) at port 27017. Now if you want to connect to the mongo shell to mongod process running in a different host and to a db called foo, you will use the following:

> mongo myhost:30000/foo
MongoDB shell version: 2.4.0
connecting to: myhost:30000/foo
>

The above example connects to a db foo running in a host myhost at port 30000.

Opening mongo shell without connecting to a specific db

There is one more magic. You can open the mongo shell and work with it without even connecting to any database (i.e. mongod). For the same, type the following:

mongo --nodb
MongoDB shell version: 2.4.9
>

After opening the shell you can connect to the specific mongod process at your convenience:

> conn = new Mongo("localhost:27017")
connection to localhost:27017
>

Please note at this point you have just connected to a mongod process but not a db yet. If you type db you will get the following error:

> db
Tue Mar 25 16:00:41.997 ReferenceError: db is not defined
>

to connect to a db, do the following:

> db = conn.getDB("test")
test
> db
test
>

Now your db variable is defined and is valid till the time the shell is open. To connect to different db you can use the same conn variable or else the "use <db>" command (as discussed in previous posts).

Tips for Using the Shell

Because mongo is simply a JavaScript shell, you can get a great deal of help for it by simply looking up JavaScript documentation online. For MongoDB-specific functionality, the shell includes built-in help that can be accessed by typing help:

> help
        db.help()                    help on db methods
        db.mycoll.help()             help on collection methods
        sh.help()                    sharding helpers
        rs.help()                    replica set helpers
        help admin                   administrative help
        help connect                 connecting to a db help
        help keys                    key shortcuts
        help misc                    misc things to know
        help mr                      mapreduce

        show dbs                     show database names
        show collections             show collections in current database
        show users                   show users in current database
        show profile                 show most recent system.profile entries with time >= 1ms
        show logs                    show the accessible logger names
        show log [name]              prints out the last segment of log in memory, 'global' is default
        use <db_name>                set current database
        db.foo.find()                list objects in collection foo
        db.foo.find( { a : 1 } )     list objects in foo where a == 1
        it                           result of the last line evaluated; use to further iterate
        DBQuery.shellBatchSize = x   set default number of items to display on shell
        exit                         quit the mongo shell
>

Database-level help is provided by db.help() and collection-level help by
db.foo.help(). A good way of figuring out what a function is doing is to type it without the parentheses. This will print the JavaScript source code for the function. For example, if we are curious about how the update function works or cannot remember the order of parameters, we can do the following:

> db.foo.update
function ( query , obj , upsert , multi ){
    assert( query , "need a query" );
    assert( obj , "need an object" );

    var firstKey = null;
    for (var k in obj) { firstKey = k; break; }

    if (firstKey != null && firstKey[0] == '$') {
        // for mods we only validate partially, for example keys may have dots
        this._validateObject( obj );
    } else {
        // we're basically inserting a brand new object, do full validation
        this._validateForStorage( obj );
    }

    // can pass options via object for improved readability
    if ( typeof(upsert) === 'object' ) {
        assert( multi === undefined, "Fourth argument must be empty when specifying upsert and multi with an object." );

        opts = upsert;
        multi = opts.multi;
        upsert = opts.upsert;
    }

    var startTime = (typeof(_verboseShell) === 'undefined' ||
                     !_verboseShell) ? 0 : new Date().getTime();
    this._mongo.update( this._fullName , query , obj , upsert ? true : false , multi ? true : false );
    this._printExtraInfo("Updated", startTime);
}
>

Majic!!!!!!!!!!!!

We can load and run custom javascripts as well in the shell. In the next post we will see how.

<< Prev                                                                                     Next >>

Sunday 9 March 2014

Primary Key in MongoDB

Today we will talk about primary key in MongoDB. Like any other database management system each document in MongoDB needs to be associated with a primary key. But unlike RDBMS MongoDB doesn't support any document without primary key (for e.g. you can create a table in RDBMS without having a primary key though; whether it is advisable or not)

Every document in MongoDB is associated with a key called "_id", which is the primary key for that document. The "_id" field is unique across the collection. The "_id" field can be of any data type but the default data type is ObjectId. If you want to insert a document in a collection without a key called "_id", MongoDB will create the field automatically and it will be of type ObjectId type. Open the mongo shell and insert the following document (with the _id field):

> db.foo.insert({_id:1, i:3, j:4, k:5})

Then find the document from the collection.

> db.foo.find()
{ "_id" : 1, "i" : 3, "j" : 4, "k" : 5 }

Now insert another document without the _id field.

> db.foo.insert({i:4, j:5, k:6})

Again find documents from the collection:

> db.foo.find()
{ "_id" : 1, "i" : 3, "j" : 4, "k" : 5 }
{ "_id" : ObjectId("5319ccee7759cfb3b91c3bcf"), "i" : 4, "j" : 5, "k" : 6 }

You see even if we don't mention "_id" it gets created automatically and the default type is ObjectId. Nevertheless MongoDB restricts duplicate entry of the primary. Try inserting a document with an _id field that already exists, you will get a duplicate key error.

> db.foo.insert({_id:1, i:4, j:5, k:6})
E11000 duplicate key error index: test.foo.$_id_ dup key: { : 1.0 }

ObjectId

ObjectId is a special data type that is lightweight, easy to generate and used to create default primary key in a MongoDB collection. The generation of the key ensures unique values to all circumstances even across multiple machines/threads. Let's analyze this in detail.

The ObjectId field uses a 12 byte storage. This storage gives them a string representation of 24 hexadecimal digits. 2 digit for every bytes. Following is the distribution of the bytes:

Timestamp : First 4 bytes represents timestamp in seconds since epoch. So you can understand how the uniqueness is achieved at the second level granularity. Timestamp at the starting of the ObjectId gives couple of more advantages:

as timestamp comes first MongoDB sorts the documents in a collections based on the insertion order
many drives extracts the create time information from these 4 bytes

Machine : The next 3 bytes represents the machine hostname where the MongoDB is running. This ensures multiple machines don't have duplicate ObjectId's. These 2 bytes are the hash of the machine hostname.

PID : To increase the uniqueness at the process level, MongoDB uses the next 2 bytes to represents the process ids (PID). This will ensure uniqueness across multiple processes (mongod) running in a single machine at the same time.

Increment : The first 9 bytes addresses the situation of different machines, processes and different seconds level. Now think of a situation where you have multiple concurrent requests coming in to generate ObjectId's in a single mongod process running in a single system at the same time. The uniqueness at this level will be achieved by the last 3 bytes. This is an increment factor. This allows upto 256exp3 (16,777,216) unique ObjectId's to be generated per process in one second.

Now we will see some example. One important thing is that the generation of the ObjectId can be done at the server level but that is generally be done at the client side (by the driver or by the shell). Open your Mongo shell and type the following in succession to see the generation pattern of the ObjectId:

> new ObjectId()
ObjectId("5315db03ff2d9ef19928e379")
> new ObjectId()
ObjectId("5315db03ff2d9ef19928e37a")
> new ObjectId()
ObjectId("5315db38ff2d9ef19928e37b")

By the time you should understand that the generation happening at the client side only as we haven't inserted anything in the DB. If you analyze the pattern of 24 hexadecimal digits, you will see the only changing digits are the timestamp digits and the increment digits. If we divide the string we can extract the following:

          Timestamp      Machine        PID      Increment
1st      5315db03       ff2d9e         f199       28e379
2nd     5315db03       ff2d9e         f199       28e37a
3rd      5315db38       ff2d9e         f199       28e37b

For all the generation machine and PID is not changing. For the first two generation timestamp is also same, meaning that these two were generated in quick succession within in a second. But for all the generations the increment field is changing and how the uniqueness is achieved.

As stated previously, if there is no "_id" key present when a document is inserted, one will be automatically added to the inserted document. This can be handled by the MongoDB server but will generally be done by the driver on the client side. The decision to generate them on the client side reflects an overall philosophy of MongoDB: work should be pushed out of the server and to the drivers whenever possible. This philosophy reflects the fact that, even with scalable databases like MongoDB, it is easier to scale out at the application layer than at the database layer. Moving work to the client side reduces the burden requiring the database to scale.

Today's discussion how some unique thing about the mongo shell. We can work with shell as a standard javascript shell and not only for the DB operations. We will see some more interesting stuffs in the next discussion.

<< Prev                                                                                     Next >>

Sunday 23 February 2014

Supported Data Types in Mongo DB

Hi Friends, we are back. Excuse me for the delay as was tied up with project deliverable (an integral part of a software engineer's life). Today we shall discuss about different data types supported in Mongo DB.

Limitations in JSON data types

As we mentioned earlier as well, document in MongoDB represents a java script object (JSON). Though JSON's structure is simple, easy-to-understand and parse there are limitations in it that it supports only six data types - null, boolean, numeric, string, array and object.

You might think, these should be sufficient at a high level to express different structures of data. But there are still few additional types which are pretty important. For example JSON doesn't have any data type to work with dates. That might be real difficult to accept specially when it is used for a database's core data types.

There is a numeric data type but that doesn't include the differentiation between floats or integers. Even that doesn't specify the distinction between 32-bit or 64-bit numbers.

It doesn't have another important data type for regular expression as well.

MongoDB's Additional Support

MongoDB adds additional support on the existing JSON data type in order to make it more flexible and a wide range support. While developing the additional support the original format of JSON of having key-value pair has been retained. The commonly supported fields and how they are represented in a document is described below:

null

Null can be used to represent both a null value and a nonexistent field:
{"x" : null}

boolean

There is a boolean type, which can be used for the values true and false:
{"x" : true}

number

The shell defaults to using 64-bit floating point numbers. Thus, these numbers look “normal” in the shell:

{"x" : 3.14}

or:

{"x" : 3}

For integers, use the NumberInt or NumberLong classes, which represent 4-byte or 8-byte signed integers, respectively.
{"x" : NumberInt("3")}
{"x" : NumberLong("3")}

string

Any string of UTF-8 characters can be represented using the string type:
{"x" : "foobar"}

date

Dates are stored as milliseconds since the epoch. The time zone is not stored:
{"x" : new Date()}

Javascript's Date object is used in MongoDB. While creating a new Date object always call new Date() and not just the constructor Date(). Calling the just the constructor will return the date as a string representation and not the actual date object. This is how javascript works. So please be careful to always call new Date() to avoid any mismatch between string and the actual date object.

Did you find it confusing? No worry we will clear this in an example:

Run your mongod instance and connect to it using the mongo shell. Use any database of your choice (say test). Type the following command:

> use test

Now insert a document using new Date() constructor in the test db in datefld collection (for example)

> db.datefld.insert({_id:1, createdOn: new Date()})

Now insert a document using Date() function in the same collection:

> db.datefld.insert({_id:2, createdOn: Date()})

Find both the documents and see the difference in the createdOn field:

> db.datefld.find().pretty()
{
    "_id" : 1,
    "createdOn" : ISODate("2014-02-23T14:42:56.883Z")
}
{
        "_id" : 2,
        "createdOn" : "Sun Feb 23 2014 20:13:25 GMT+0530 (India Standard Time)"
}

The new Date() constructor (document with _id:1) return a date object while the Date() function call (document with _id:2) returns a string representation of the date.

While showing the date in the shell it is shown as ISO date with the local time zone settings. Though it is stored in the database as a millisecond value since the epoch. So they don't have any timezone information. As a workaround you can store the timezone information in a separate field.

regular expression

Queries can use regular expressions using JavaScript’s regular expression syntax:
{"x" : /foobar/i}

array

Sets or lists of values can be represented as arrays:
{"x" : ["a", "b", "c"]}

Arrays in MongoDB can represent both ordered (stack, list or queue) as well as unordered (set) operations. Following is An example on an array in MongoDB:

{"address" : 930, "Casanova Avenue", "CA", 93940, 10.5}

As you can an element of an array can be of any type supported in MongoDB (in the above example integer, float and string types are mentioned). It can also contain a nested array as an element.

One of the advantages of having an array in a document is that MongoDB can understand its structure very well and can reach to a specific elements in order to query/update/delete etc. For example the average temperature (10.5) in the above document can be changed easily. Also MongoDB can create indexes on the arrays elements.

Embedded document

Documents can contain entire documents embedded as values in a parent document:
{"x" : {"foo" : "bar"}}

Documents can be used as the value for a key. This is called an embedded document.Embedded documents can be used to organize data in a more natural way than just a flat structure of key/value pairs. For example, if we have a document representing a person and want to store his address, we can nest this information in an embedded "address" document:

{
"name" : "Pradosh Chandra Mitra",
    "address" : {
        "street" : "21 Rajani Sen Rd",
        "city" : "Kolkata",
        "state" : "WB"
    }
}

The value for the "address" key in the previous example is an embedded document with its own key/value pairs for "street", "city", and "state". As with arrays, MongoDB “understands” the structure of embedded documents and is able to reach inside them to build indexes, perform queries, or make updates.

object id

An object id is a 12-byte ID for documents. Our next discussion will describe this in detail:
{"x" : ObjectId()}

There are also a few less common types that you may need, including:

binary data

Binary data is a string of arbitrary bytes. It cannot be manipulated from the shell. Binary data is the only way to save non-UTF-8 strings to the database.

code

Queries and documents can also contain arbitrary JavaScript code:
{"x" : function() { /* ... */ }}

<< Prev                                                                                     Next >>

Saturday 1 February 2014

Term Comparison: MongoDB vs. RDBMS

Today's discussion will be a fun my friends as we will learn few nomenclature of MongoDB so that our learning will get some pace.

I would request you to quickly recap our first day discussion for configuring MongoDB. After that it is assumed that MongoDB is up and running in your local environment.

In your mongo shell (assuming your have just opened it) type the following command:

> db

This will give you the database your are currently connected to.

The mongo shell is a java script shell accepting all the java script command. db is a variable which is used to return the current database you are connected to.

RDBMS Database = MongoDB Database

Database is a physical unit of file in MongoDB which contains data grouped in logically in an application. Once the user starts inserting some data into a DB it will create a physical file with the same name of the database at the data folder (remember we have a data folder at the time of start up). Please note we mention "start inserting" that means till the time there is no record in a database there is no physical file created.

To see what all databases are present in the running MongoDB instance, type the following command:

> show dbs

This will list down all the databases with their corresponding size (see above image). To connect to some other database type

> use [db name]

Now let say you want to connect to a database which doesn't exist. MongoDB will not restrict you to do so. It will logically create a variable and connect you. For example if you don't have a database called blog (show dbs command doesn't return any name called blog) and you type use blog, the shell will give you the same result as above. At this time it is just a logical name than a physical entity. To check, go to your data/db folder and you will find NO file with the name blog. Now we will insert some data and see what happens.

RDBMS Table = MongoDB Collection

The highest logical unit inside a database is called a Collection. Collection is same as Table in RDBMS. There is no command like CREATE COLLECTION exists in MongoDB (as opposed to CREATE TABLE command in RDBMS). The first time you insert some record in a collection, it will get created. Let say we want to insert some record in a collection called user in the blog database. Before that lets find whether the user collection exists or not. To check the same, type:

> show collections

The shell returns nothing as there is no collections in the blog database. Now insert a record into the user collection using the following command:

> db.user.insert({"firstName" : "Roger", "lastName" : "Federar", "plays" : "Tennis" })

Now you type the same command for showing collections, see the result:

Also to list the databases please type show dbs and see the result now. The blog database is listed down. Now you go to the data/db folder. You should find the following three files created for the blog database:

blog.0
blog.1
blog.ns

Note: You see there is no CREATE DATABASE command exists either ... cool right?

Row/Column = Document/Field

The concept of a row in RDBMS is replaced here as Document (JSON formatted). As you see in the previous example a document is a java script object. For example:

{"firstName" : "Roger", "lastName" : "Federar", "plays" : "Tennis"}

This is an object with three fields (Key-Value pair) namely - firstName, lastName and plays. The keys of the document is always a string while the value can be of several types. MongoDB supports null, boolean, number, string, date, array, embedded document, Object Id, binary data, code as the value of a key. We will go over each type in the next discussion.

Index = Index

The concept of indexing is similar here as in case of RDBMS. Actually MongoDB support various kinds of indexes (as mentioned in the previous discussion). Basically it does indexing using B-Tree indexes. There will be detailed discussion later with indexing in MongoDB.

Foreign Key = Reference

MongoDB doesn't support referential integrity, though the reference of one field of one collection is declared as @DBRef in another collection (we will discuss this as well in future post, just want to keep things simple).

<< Prev Next >>

Saturday 25 January 2014

Features That Makes the difference

Hello friends... In our last discussion we have covered how the distributed architecture will give better performance for all the BIG data solution and NoSQL technologies. In this post we will cover various aspects of MongoDB which make the difference.

Ease of Use:

At the very core of its architecture MongoDB is a document-oriented database and not a relational one. The main reason of moving away from a relational structure is to make the scale out easier though that brings other advantages as well.

MongoDB replaces the concept of "row" with a concept of "Document". A document is nothing but a java script object formatted string (JSON) with a Key-Value structure, that allows embedding of child document and arrays. This makes any hierarchical relationship in a single record.

Think of a scenario, where you want to find a full hierarchy of any structure starting from any of the child node traversing to its root. You use CONNECT BY PRIOR query right? ... Which itself is a very costly one. This will be far more faster in MongoDB with this document structure.

Schema Less:

With your relational database you define a schema first before you start doing any development. Now keeping today's development agility in mind (where change is the only constant) we cannot avoid changes in our data model as we move forward towards the development life cycle. And this would obviously impact your application layer.

MongoDB is schema less. You don't have to restrict yourself in a strict definition. The document's keys and values are not of fixed size and type. Addition and removal of keys also become easier with makes the development faster.

Easy Scaling:

We have already discussed how scale out provides better performance. MongoDB has been designed to scale out. The document-oriented data structure makes it easier to split one record across multiple servers. The database automatically takes care takes care of balancing the data and load across a cluster, redistributing documents automatically and routing user requests to the correct machines. So the developers can focus on programming the application, not scaling it. When a cluster need more capacity, new machines can be added and MongoDB will figure out how the existing data should be spread to them.

Indexing:

MongoDB is designed to support generic secondary indexes, allowing a variety of fast queries. This provides unique, compound and full text indexes. One of the unique feature of MongoDB is to support geospatial indexes.

Aggregation:

MongoDB supports a pipeline concept to build complex aggregation from simple pieces and allow the database to optimize it. This helps the user to implement complex logic, filter, sort, skip, limit in one query.

Special Collection:

In MongoDB a table (in RDBMS) is known as a Collection. MongoDB supports time-to-live collections for data that should expire at a certain time, such as sessions. Fixed-size collections, which are useful for holding recent data, such as logs are also supported.

Other than these unique features other common features like Replication, Backup & Recovery, Monitoring statistics, Security and User Administration are also supported here.

As we pointed out earlier two main features of a relational DB namely join and multi-document transaction is not supported in MongoDB. There are provisions and recommendations on how to address these "limitation"s. The schema design of MongoDB plays a major role in it and that will answer your question.

Before going to the schema design we will quickly touch upon some of the MongoDB nomenclature and practicals.

<< Prev Next >>

Sunday 19 January 2014

A Deep Dive to Achieve Better Performance and Scalability

Welcome back friends.

First let me thank you all for the over whelming (unexpected too) responses after the last post. We have received multiple responses from multiple channels.

The main goal for this discussion would be to elaborate the point of having much better performance and ease of scalability of the NoSql technologies over the traditional RDBMS.

Distributed File System Architecture:

The KEY lies in the distributed file system architecture and parallel processing of all the BIG data solutions. Let's discuss this with an example. I will ask you to do some math here.

Let say there is a data of 1 TB we need to read from a disk. The disk has 4 I/O channels each of which is having 100 MB/sec I/O speed. Your assignment is to calculate what time it would take to read the whole 1 TB of data using those 4 I/O channels. (no scroll please before you calculate .. :))

Time to read through one channel is
t = (1000000000000 / 100 x 1000000) = 10000 sec

Time to read through 4 channels is
t = 10000/4 = 2500 sec = 41.66 min

Pretty simple right!!!

Now let's distribute this data into 10 different chunks into 10 different disks with similar configuration as earlier. What would be the total time to read the 1 TB data this time?

t = 2500 / 10 = 250 sec = 4.16 min

A straightaway advantage of 10 times in performance, cool !!! (though in real life this won't be a straight math; there would be some additional time required due to the network latency; but there would be advantage).

In Real Time Scenario:

You might think what is the big deal of having the distributed architecture in case of the traditional RDBMS and why is that we need to go for a totally new solution. That is the whole essence.

To build up a system highly scalable and cost-effective at the same time, we have two approaches:
- Vertical Scalability (scaling up)
- Horizontal Scalability (scaling out)

Vertical scalability means upgrading the resources of the same system (like RAM, processor or more disk space etc.). This is not a cost effective solution as high end servers are costly and at one point of time this would become out of reach in practical.

Horizontal scalability means adding up more resources in cluster for parallel processing. With Mongo DB, scaling out has become very easy (easily configurable) and one can scale out their existing database with low-cost commodity hardware. To start with a single node cluster can be configured. With the growing of its data volume more servers can be added into the cluster (this technique is called sharding in Mongo DB terminology) without affecting the application development and with zero down time.

The other aspect of Mongo DB is its variety of data to be supported. By saying variety of data we mean structured, semi-structured as well as unstructured data can very well be laid out in the design.

In our next discussion we will talk about some of the important features of Mongo DB and what value those features bring in as compared to its counterpart.

See you there ...

<< Prev Next >>

Thursday 16 January 2014

Why Mongo DB

Hi Friends, we are back again with the Mongo DB mania.

In our last discussion we had discussed about installation of Mongo DB in your system. Before going to the next level of details we will touch upon some of the background stuffs. In this post we will see some interesting topics covering in which stage Mongo DB has been evolved and why at all we should use this. This is one of the pretty basic question one should ask before learning or using any new technology.

The BIG DATA Scenario:

Let's get a step back to 2006-2007. Industry leaders started facing the impact of rate of data growth, which started increasing rapidly every year. A terabyte of data once a less heard has become pretty common and frequent scenario now-a-days (we are getting 1 TB flash as well). An airline jet collects 10 terabyte of sensor data for every 30 minutes of its flying time. NYSE generates about one terabyte of new trade per day to perform stock trading analytic to determine trends. Facebook users spend 10.5 billion minutes (almost 20,000 years) online on the social network It has an average of 3.2 billion likes and comments are posted every day. Twitter has over 500 million registered users.
As per as the Global Data volume is concerned it has already crossed 2.7 Zeta bytes (1 ZB = 1 Billion Tera bytes, please count the trailing zeros yourself).

With all these statistics is mind, it can proudly be said that storing of this enormous data is NOT a challenge (as cost of storing the data keeps becoming cheaper and cheaper, with the latest invent of semiconductor technology). The problem comes when we started processing using these BIG data. Even though storing capacity is getting higher but IO speed is not increasing much comparatively and that is what becoming a bottleneck.

Challenge with the traditional RDBMS:

In the last 10 years, the Internet has challenged relational databases in ways nobody could have foreseen. First you have a single server with a small data set. Then you find yourself setting up replication so you can scale out reads and deal with potential failures. And, before too long, you’ve added a caching layer,
tuned all the queries, and thrown even more hardware at the problem. Eventually you arrive at the point when you need to shard the data across multiple clusters and rebuild a ton of application logic to deal with it. And soon after that you realize that you’re locked into the schema you modeled so many months before.

Why? Because there’s so much data in your clusters now that altering the schema will take a long time and involve a lot of precious DBA time. It’s easier just to work around it in code. This can keep a small team of developers busy for many months. In the end, you’ll always find yourself wondering if there’s a better way—or why more of these features are not built into the core database server.

Welcome to the NoSQL world:

Keeping all these challenges in mind it was time to come up with an alternative. Mongo DB has been invented by 10 Gen, which is a powerful, flexible, and scalable general-purpose database. It combines
the ability to scale out with features such as secondary indexes, range queries, sorting, aggregations, and geo-spatial indexes.

The easy-to-use features of Mongo DB enables the agile developers to build their application fast with cost effective scaling out capability providing high performance.

Though there are some compromises from features perspective as compared to the relational databases. It doesn't support join and multi-document (if you don't know a document, don't worry, we will cover in the next posts) transaction. This was a well thought design decision as to support performance and scalability prior to those two features. With that said, it is very well guided in order to design your application.

Nice stuff to know!! We will now grow in pace...

<< Prev Next >>