Tuesday, 25 March 2014

Magic with Mongo Shell

Today will see some tips and interesting stuffs with mongo shell. The shell is not only the means to access the mongo database (mongod) but also is a javascript editor.

You can perform various javascript operation. For e.g. you can perform arithmetic operation:

> var i = 10;
> i
10
> var j = 1;
> j
1
> var k = i + j;
> k
11
>

The scope of the variable i,j and k will be valid until you close this shell.


Connecting to a specific db

Generally if you open the shell without any argument, by default it connects to the mongod process running in the same system (localhost) at port 27017. Now if you want to connect to the mongo shell to mongod process running in a different host and to a db called foo, you will use the following:

> mongo myhost:30000/foo
MongoDB shell version: 2.4.0
connecting to: myhost:30000/foo
>

The above example connects to a db foo running in a host myhost at port 30000.


Opening mongo shell without connecting to a specific db

There is one more magic. You can open the mongo shell and work with it without even connecting to any database (i.e. mongod). For the same, type the following:

mongo --nodb
MongoDB shell version: 2.4.9
>

After opening the shell you can connect to the specific mongod process at your convenience:

> conn = new Mongo("localhost:27017")
connection to localhost:27017
>

Please note at this point you have just connected to a mongod process but not a db yet. If you type db you will get the following error:

> db
Tue Mar 25 16:00:41.997 ReferenceError: db is not defined
>

to connect to a db, do the following:

> db = conn.getDB("test")
test
> db
test
>

Now your db variable is defined and is valid till the time the shell is open. To connect to different db you can use the same conn variable or else the "use <db>" command (as discussed in previous posts).



Tips for Using the Shell 


Because mongo is simply a JavaScript shell, you can get a great deal of help for it by simply looking up JavaScript documentation online. For MongoDB-specific functionality, the shell includes built-in help that can be accessed by typing help:

> help
        db.help()                    help on db methods
        db.mycoll.help()             help on collection methods
        sh.help()                    sharding helpers
        rs.help()                    replica set helpers
        help admin                   administrative help
        help connect                 connecting to a db help
        help keys                    key shortcuts
        help misc                    misc things to know
        help mr                      mapreduce

        show dbs                     show database names
        show collections             show collections in current database
        show users                   show users in current database
        show profile                 show most recent system.profile entries with time >= 1ms
        show logs                    show the accessible logger names
        show log [name]              prints out the last segment of log in memory, 'global' is default
        use <db_name>                set current database
        db.foo.find()                list objects in collection foo
        db.foo.find( { a : 1 } )     list objects in foo where a == 1
        it                           result of the last line evaluated; use to further iterate
        DBQuery.shellBatchSize = x   set default number of items to display on shell
        exit                         quit the mongo shell
>


Database-level help is provided by db.help() and collection-level help by
db.foo.help(). A good way of figuring out what a function is doing is to type it without the parentheses. This will print the JavaScript source code for the function. For example, if we are curious about how the update function works or cannot remember the order of parameters, we can do the following:

> db.foo.update
function ( query , obj , upsert , multi ){
    assert( query , "need a query" );
    assert( obj , "need an object" );

    var firstKey = null;
    for (var k in obj) { firstKey = k; break; }

    if (firstKey != null && firstKey[0] == '$') {
        // for mods we only validate partially, for example keys may have dots
        this._validateObject( obj );
    } else {
        // we're basically inserting a brand new object, do full validation
        this._validateForStorage( obj );
    }

    // can pass options via object for improved readability
    if ( typeof(upsert) === 'object' ) {
        assert( multi === undefined, "Fourth argument must be empty when specifying upsert and multi with an object." );


        opts = upsert;
        multi = opts.multi;
        upsert = opts.upsert;
    }

    var startTime = (typeof(_verboseShell) === 'undefined' ||
                     !_verboseShell) ? 0 : new Date().getTime();
    this._mongo.update( this._fullName , query , obj , upsert ? true : false , multi ? true : false );
    this._printExtraInfo("Updated", startTime);
}
>


Majic!!!!!!!!!!!!

We can load and run custom javascripts as well in the shell. In the next post we will see how.


<< Prev                                                                                     Next >>

Sunday, 9 March 2014

Primary Key in MongoDB

Today we will talk about primary key in MongoDB. Like any other database management system each document in MongoDB needs to be associated with a primary key. But unlike RDBMS MongoDB doesn't support any document without primary key (for e.g. you can create a table in RDBMS without having a primary key though; whether it is advisable or not)

Every document in MongoDB is associated with a key called "_id", which is the primary key for that document. The "_id" field is unique across the collection. The "_id" field can be of any data type but the default data type is ObjectId. If you want to insert a document in a collection without a key called "_id", MongoDB will create the field automatically and it will be of type ObjectId type. Open the mongo shell and insert the following document (with the _id field):

> db.foo.insert({_id:1, i:3, j:4, k:5})


Then find the document from the collection.

> db.foo.find()

{ "_id" : 1, "i" : 3, "j" : 4, "k" : 5 }

Now insert another document without the _id field.

> db.foo.insert({i:4, j:5, k:6})

Again find documents from the collection:

> db.foo.find()

{ "_id" : 1, "i" : 3, "j" : 4, "k" : 5 }
{ "_id" : ObjectId("5319ccee7759cfb3b91c3bcf"), "i" : 4, "j" : 5, "k" : 6 }

You see even if we don't mention "_id" it gets created automatically and the default type is ObjectId. Nevertheless MongoDB restricts duplicate entry of the primary. Try inserting a document with an _id field that already exists, you will get a duplicate key error.

> db.foo.insert({_id:1, i:4, j:5, k:6})
E11000 duplicate key error index: test.foo.$_id_  dup key: { : 1.0 }





 

ObjectId

ObjectId is a special data type that is lightweight, easy to generate and used to create default primary key in a MongoDB collection. The generation of the key ensures unique values to all circumstances even across multiple machines/threads. Let's analyze this in detail.

The ObjectId field uses a 12 byte storage. This storage gives them a string representation of 24 hexadecimal digits. 2 digit for every bytes. Following is the distribution of the bytes:




Timestamp : First 4 bytes represents timestamp in seconds since epoch. So you can understand how the uniqueness is achieved at the second level granularity. Timestamp at the starting of the ObjectId gives couple of more advantages:
  • as timestamp comes first MongoDB sorts the documents in a collections based on the insertion order
  • many drives extracts the create time information from these 4 bytes

Machine : The next 3 bytes represents the machine hostname where the MongoDB is running. This ensures multiple machines don't have duplicate ObjectId's. These 2 bytes are the hash of the machine hostname.

PID : To increase the uniqueness at the process level, MongoDB uses the next 2 bytes to represents the process ids (PID). This will ensure uniqueness across multiple processes (mongod) running in a single machine at the same time.

Increment : The first 9 bytes addresses the situation of different machines, processes and different seconds level. Now think of a situation where you have multiple concurrent requests coming in to generate ObjectId's in a single mongod process running in a single system at the same time. The uniqueness at this level will be achieved by the last 3 bytes. This is an increment factor. This allows upto 256exp3 (16,777,216) unique ObjectId's to be generated per process in one second.


Now we will see some example. One important thing is that the generation of the ObjectId can be done at the server level but that is generally be done at the client side (by the driver or by the shell). Open your Mongo shell and type the following in succession to see the generation pattern of the ObjectId:

> new ObjectId()
ObjectId("5315db03ff2d9ef19928e379")
> new ObjectId()
ObjectId("5315db03ff2d9ef19928e37a")
> new ObjectId()
ObjectId("5315db38ff2d9ef19928e37b")

By the time you should understand that the generation happening at the client side only as we haven't inserted anything in the DB. If you analyze the pattern of 24 hexadecimal digits, you will see the only changing digits are the timestamp digits and the increment digits. If we divide the string we can extract the following:



          Timestamp      Machine        PID      Increment
1st      5315db03       ff2d9e         f199       28e379
2nd     5315db03       ff2d9e         f199       28e37a
3rd      5315db38       ff2d9e         f199       28e37b




For all the generation machine and PID is not changing. For the first two generation timestamp is also same, meaning that these two were generated in quick succession within in a second. But for all the generations the increment field is changing and how the uniqueness is achieved.


As stated previously, if there is no "_id" key present when a document is inserted, one will be automatically added to the inserted document. This can be handled by the MongoDB server but will generally be done by the driver on the client side. The decision to generate them on the client side reflects an overall philosophy of MongoDB: work should be pushed out of the server and to the drivers whenever possible. This philosophy reflects the fact that, even with scalable databases like MongoDB, it is easier to scale out at the application layer than at the database layer. Moving work to the client side  reduces the burden requiring the database to scale.


Today's discussion how some unique thing about the mongo shell. We can work with shell as a standard javascript shell and not only for the DB operations. We will see some more interesting stuffs in the next discussion. 

<< Prev                                                                                     Next >>