Introduction to Databases and NoSQL.

Databases are applications that help in storing data (DataStore) and retrieving specific data based on query (Query Engine) and present to the application that requires it. Here is a brief history of how these set of applications evolved and the birth of ‘NoSQL’ databases.

Brief History of Databases

Need for storing information using structured storage and retrieval mechanism started in 70’s when the storage and cpu was available at premium costs and most of them were linked to enterprise payroll and related applications. Same trend prevailed through the PC era and the Server/Client era of 80’s and early 90’s. Vendors had made more efficient engines that could increase storage in these databases and also make a powerful querying/retrieval mechanism with SQL standards that made Relational Database Management Systems (RDBMS) a mandatory requirement of many applications.

With Internet era from mid-90’s the variety of data that had to be handled moved beyond the enterprise based applications and signaled the need for change for the database vendors. However there was not much of significant shift offered by them towards more efficient handling of images, unstructured data etc. that led to alternate thought process amongst the Internet software developers. Cost and standardization of hardware led to easiness in scaling of the data centers that store these information that led to NoSQL movement.

In late-90’s Google with its challenges of handling the search results in ever growing websites crawls felt the challenge of scaling with the traditional databases and started to store and retrieve the words in an alternate form that was linked more closely to the data structures used in their page ranking application. This made the response to search results much faster avoiding the overheads of the SQL transformations of the data for storage/retrieval.

Other organization too started addressing challenges of large data and expectation of  ‘internet time response`, companies like Facebook adopted a more graphical structure to store the relationship of its data and preferred to store it as is using graph databases. This adoption of alternate data-stores led to the BigData genre of databases and tools that helped application developers to provide efficient storage and retrieval for faster response.

There was also need from infrastructure providers to monitor the application logs that increased heavily as website traffic rose which led to storing the log texts as is and text based searches for faster retrieval. This led to development of more analytic frameworks like Hadoop that helped make sense of large set of data that were stored in the text/document based data-stores. NoSQL databases became mainstream of handling large data sets (BigData) and for the analytical tools that query them.

NoSQL Databases

There are four major categories of NoSQL databases catering to data storage needs of modern applications. There are also new categories of databases that offer combination of these referred as MultiModal databases. However the underlying mechanisms of storage belong to one of the categories discussed below. They are arranged in the increasing order of complexities they can handle.

Type of Database Description Examples
Key-Value pair Information is stored as a pair of Key-Value, Key being a unique identifier that represents the Value that is being stored. Typical example of these are information that is stored in configuration files, personal information collected in websites etc. This is one of the fundamental storage option available in all applications where the data is stored in a variable that becomes the key against which the value stored in that variable will be persisted.  redis

dynamodb

riak

Columnar This database is flipped version of RDBMS where the rows and columns are interchanged helping in faster query of related items from the storage. This also helps in compressing the data stored optimizing the usage of hard disks when the columns grow exponentially. The keywords in the site is stored as columns with the URLs as rows, so when groups of words are searched these column data are obtained in one go and all the URL’s stored in the rows are displayed in the browser. hbase

cassandra

Document Information is stored in groups of data of different type that have similar structure. Blogs contain mostly text with embedded references to other types of media namely image, audio, video, URLs etc., These blogs can be stored with identifier of date on which it was posted and with author who will have the permissions to edit. Unlike the RDBMS this database storage does not require any strict schema for data, however the application needs to have some structured schema defined for the storage to be effective.  couchdb

mongodb

Graph This database stores the information and the relationship between them using graphical representation instead of tables or key-value pairs. Data is stored as nodes of the graph and the relationship is the link connecting the nodes, additional properties regarding the nodes or the relationship are stored along with. Each person is a node with gender and age related information are additional properties, persons are connected with the ‘knows’ relationship with properties like school friend, college friend or life partner etc. as additional properties of the ‘knows’ relationship.  neo4j

orientdb_logo1

There are applications that are being built currently that require multiple types of databases to be used, multimodal databases provide different format of data storage that are not SQL based. There are applications that are built combining the power of RDBMS and the NoSQL database type and these ‘NewSQL’ approaches have made the NoSQL definition to be provided as ‘Not Only SQL’ instead of ‘No to SQL’.

In the next blog we will look at how healthcare data types can map to one of these NoSQL databases.

SQL to NoSQL

Have been wanting to share experience of moving from SQL based database to NoSQL database in our application. In the sequence of posts in next few days will try to highlight the product we do, problems encountered with SQL and then experience with NOSQL database.