Sunday 23 February 2014

Supported Data Types in Mongo DB

Hi Friends, we are back. Excuse me for the delay as was tied up with project deliverable (an integral part of a software engineer's life). Today we shall discuss about different data types supported in Mongo DB.


Limitations in JSON data types

As we mentioned earlier as well, document in MongoDB represents a java script object (JSON). Though JSON's structure is simple, easy-to-understand and parse there are limitations in it that it supports only six data types - null, boolean, numeric, string, array and object.

You might think, these should be sufficient at a high level to express different structures of data. But there are still few additional types which are pretty important. For example JSON doesn't have any data type to work with dates. That might be real difficult to accept specially when it is used for a database's core data types.

There is a numeric data type but that doesn't include the differentiation between floats or integers. Even that doesn't specify the distinction between 32-bit or 64-bit numbers.

It doesn't have another important data type for regular expression as well.



MongoDB's Additional Support

MongoDB adds additional support on the existing JSON data type in order to make it more flexible and a wide range support. While developing the additional support the original format of JSON of having key-value pair has been retained. The commonly supported fields and how they are represented in a document is described below:

null

Null can be used to represent both a null value and a nonexistent field:
{"x" : null}

boolean

There is a boolean type, which can be used for the values true and false:
{"x" : true}

number

The shell defaults to using 64-bit floating point numbers. Thus, these numbers look “normal” in the shell:

{"x" : 3.14}

or:

{"x" : 3}

For integers, use the NumberInt or NumberLong classes, which represent 4-byte or 8-byte signed integers, respectively.
{"x" : NumberInt("3")}
{"x" : NumberLong("3")}

string

Any string of UTF-8 characters can be represented using the string type:
{"x" : "foobar"}

date

Dates are stored as milliseconds since the epoch. The time zone is not stored:
{"x" : new Date()}

Javascript's Date object is used in MongoDB. While creating a new Date object always call new Date() and not just the constructor Date(). Calling the just the constructor will return the date as a string representation and not the actual date object. This is how javascript works. So please be careful to always call new Date() to avoid any mismatch between string and the actual date object.

Did you find it confusing? No worry we will clear this in an example:

Run your mongod instance and connect to it using the mongo shell. Use any database of your choice (say test). Type the following command:

> use test

Now insert a document using new Date() constructor in the test db in datefld collection (for example)

> db.datefld.insert({_id:1, createdOn: new Date()})

Now insert a document using Date() function in the same collection:

> db.datefld.insert({_id:2, createdOn: Date()})

Find both the documents and see the difference in the createdOn field:

> db.datefld.find().pretty()
{
    "_id" : 1,
    "createdOn" : ISODate("2014-02-23T14:42:56.883Z")
}
{
        "_id" : 2,
        "createdOn" : "Sun Feb 23 2014 20:13:25 GMT+0530 (India Standard Time)"
}

The new Date() constructor (document with _id:1) return a date object while the Date() function call (document with _id:2) returns a string representation of the date.

While showing the date in the shell it is shown as ISO date with the local time zone settings. Though it is stored in the database as a millisecond value since the epoch. So they don't have any timezone information. As a workaround you can store the timezone information in a separate field.


regular expression

Queries can use regular expressions using JavaScript’s regular expression syntax:
{"x" : /foobar/i}

array

Sets or lists of values can be represented as arrays:
{"x" : ["a", "b", "c"]}

Arrays in MongoDB can represent both ordered (stack, list or queue) as well as unordered (set) operations. Following is An example on an array in MongoDB:

{"address" : 930, "Casanova Avenue", "CA", 93940, 10.5}

As you can an element of an array can be of any type supported in MongoDB (in the above example integer, float and string types are mentioned). It can also contain a nested array as an element.

One of the advantages of having an array in a document is that MongoDB can understand its structure very well and can reach to a specific elements in order to query/update/delete etc. For example the average temperature (10.5) in the above document can be changed easily. Also MongoDB can create indexes on the arrays elements.



Embedded document

Documents can contain entire documents embedded as values in a parent document:
{"x" : {"foo" : "bar"}}

Documents can be used as the value for a key. This is called an embedded document.Embedded documents can be used to organize data in a more natural way than just a flat structure of key/value pairs. For example, if we have a document representing a person and want to store his address, we can nest this information in an embedded "address" document:

{
"name" : "Pradosh Chandra Mitra",
    "address" : {
        "street" : "21 Rajani Sen Rd",
        "city" : "Kolkata",
        "state" : "WB"
    }
}

The value for the "address" key in the previous example is an embedded document with its own key/value pairs for "street", "city", and "state". As with arrays, MongoDB “understands” the structure of embedded documents and is able to reach inside them to build indexes, perform queries, or make updates.



object id

An object id is a 12-byte ID for documents. Our next discussion will describe this in detail:
{"x" : ObjectId()}


There are also a few less common types that you may need, including:

binary data

Binary data is a string of arbitrary bytes. It cannot be manipulated from the shell. Binary data is the only way to save non-UTF-8 strings to the database.

code

Queries and documents can also contain arbitrary JavaScript code:
{"x" : function() { /* ... */ }}



<< Prev                                                                                     Next >>

Saturday 1 February 2014

Term Comparison: MongoDB vs. RDBMS

Today's discussion will be a fun my friends as we will learn few nomenclature of MongoDB so that our learning will get some pace.

I would request you to quickly recap our first day discussion for configuring MongoDB. After that it is assumed that MongoDB is up and running in your local environment.

In your mongo shell (assuming your have just opened it) type the following command:

> db

This will give you the database your are currently connected to.

The mongo shell is a java script shell accepting all the java script command. db is a variable which is used to return the current database you are connected to.

RDBMS Database = MongoDB Database

Database is a physical unit of file in MongoDB which contains data grouped in logically in an application. Once the user starts inserting some data into a DB it will create a physical file with the same name of the database at the data folder (remember we have a data folder at the time of start up). Please note we mention "start inserting" that means till the time there is no record in a database there is no physical file created. 

To see what all databases are present in the running MongoDB instance, type the following command:

> show dbs




This will list down all the databases with their corresponding size (see above image). To connect to some other database type 

> use [db name]





Now let say you want to connect to a database which doesn't exist. MongoDB will not restrict you to do so. It will logically create a variable and connect you. For example if you don't have a database called blog (show dbs command doesn't return any name called blog) and you type use blog, the shell will give you the same result as above. At this time it is just a logical name than a physical entity. To check, go to your data/db folder and you will find NO file with the name blog. Now we will insert some data and see what happens.



RDBMS Table = MongoDB Collection

The highest logical unit inside a database is called a Collection. Collection is same as Table in RDBMS. There is no command like CREATE COLLECTION exists in MongoDB (as opposed to CREATE TABLE command in RDBMS). The first time you insert some record in a collection, it will get created. Let say we want to insert some record in a collection called user in the blog database. Before that lets find whether the user collection exists or not. To check the same, type:

> show collections

The shell returns nothing as there is no collections in the blog database. Now insert a record into the user collection using the following command:

> db.user.insert({"firstName" : "Roger", "lastName" : "Federar", "plays" : "Tennis" })

Now you type the same command for showing collections, see the result:





Also to list the databases please type show dbs and see the result now. The blog database is listed down. Now you go to the data/db folder. You should find the following three files created for the blog database:

blog.0
blog.1
blog.ns

Note: You see there is no CREATE DATABASE command exists either ... cool right?

Row/Column = Document/Field

The concept of a row in RDBMS is replaced here as Document (JSON formatted). As you see in the previous example a document is a java script object. For example:

{"firstName" : "Roger", "lastName" : "Federar", "plays" : "Tennis"}

This is an object with three fields (Key-Value pair) namely - firstName, lastName and plays. The keys of the document is always a string while the value can be of several types. MongoDB supports null, boolean, number, string, date, array, embedded document, Object Id, binary data, code as the value of a key. We will go over each type in the next discussion.

Index = Index


The concept of indexing is similar here as in case of RDBMS. Actually MongoDB support various kinds of indexes (as mentioned in the previous discussion). Basically it does indexing using B-Tree indexes. There will be detailed discussion later with indexing in MongoDB.

Foreign Key = Reference

MongoDB doesn't support referential integrity, though the reference of one field of one collection is declared as @DBRef in another collection (we will discuss this as well in future post, just want to keep things simple).



<< Prev                                                                                     Next >>