Thursday, May 12, 2016

Buzzz Word "NO-SQL"

Buzz Word NO-SQL, Yes from last few months, I was hearing a lot about No-SQL, but knew nothing about. This is one more blog out of thousand more which you can find online. It is just a collection of all information which a got while trying to know what is NO-SQL. The try here is not to give complete information, but to introduce you to the NO-SQL world and important term, different type and other related information between them.

What is NO-SQL?

No-SQL is a set of database, which stores or manages unstructured data. It is very important to understand what a database is. A traditional database is called RDBMS, which are relational databases. In this kind of databases, we general store structured data. When we say structured data, it means we define tables, columns and each column has a set of type. When we want to store any data, we convert data into the same format which can be added as per table definition, when we do not have data. We general added null to it. We will also have a relation between the tables using which we combine the data while fetching data using queries.
In NO-SQL we define some structures in some database it is called document and some it is called table. The structures are more helpful to fetch data than defining rules for data storage.In NO-SQL do not have any relationship between the document/tables. Each data is separate entities. It can have a relation in java world or another world, but it is not defined in the database. No-SQL databases don't have join which fetching data, instead we need to run different queries to fetch data.

Why to use No-SQL Databases?

No-SQL, databases are lightweight databases, easily scalable, high performance and can have zero downtime.

  • Lightweight Databases: The ram and memory which is taken the database itself is very less. I remember, No-SQL database such as MongoDB, was one of the preferred databases to be used in mobile application to store data locally. It has the different version of it.
  • Easily Scalable: Even RDBMS databases scalable, but there is a small difference between scales up and scales out. Scale up is increasing the hardware infra of the same machine, whereas the scale out is added one more machine to cluster to increase the capacity.  Even RDBMS databases have the abilities to scale out, but the not an easy step. Most of the NO-SQL databases will run in a cluster, and all the machine is cluster can be a simple desktop machine which we use.
  • High Performance: Most of the No-SQL bet of high performance, you can find a lot of results, which show that the response time for databases is less. But somewhere I find this comparison as apple to the orange, as we have different types of No-SQL databases, each more suits for different needs. Even No-SQL databases have their advantage and disadvantage of the data which is fetched, one of them is join table while fetching data.
  •  Zero Downtime : This is one more advantage for go to No-SQL database is Zero downtime, which means, cluster can be designed in such a way that if some machine fail which running or machine are down for maintains, another machine in cluster can provide the data for request and outside system (requester) which not have effect of database issue. This is an inbuilt feature for most of the databases, as this feature is due to core design on which the NO-SQL systems are developed.


What is Normalization and De-Normalization?

Most of the Relational databases store data in normalization, in simple word, no data duplicate, instead link the data between table using relations like foreign keys. When we want to find any related data, we generally use join keyword in the query, join the table and fetch the data, the advantage here is we reduce data duplication which in turns save the hard disk used, also we can go and update the data at only once place or table. 
In the NO-SQL world, it is said that disk space is cheaper than the CPU & No-SQL don't understand any relation between data, don't provide join between data so that we can join while we are fetching the data. To overcome this difficult it is said that create smaller tables as required by your queries and duplicate that data. For example, if you queries have where clause on the first name and percentage scored, then as these will be different tables, just create one more table and add the data, while fetching the data you can fetch from this table and if need you can fetch another record from respective tables.
But some No-SQL databases, give embedded row inside once record, those are generally document oriented No-SQL databases.

No-SQL Classification

No-SQL is not a database; it is one classification of databases based on what kind of data is stored in it and some other features. We have different kind of databases under the No-SQL or Big Data.
  • Column: Column Key is No-SQL databases, which store data as column wise, if the data is like person data like FristName, LastName, EmailId, password than the data will be fetched in the same way.
  • Document: Document oriented No-SQL database will store the data is a document, when we say document, it is text document and not  binary data like the image and other file sets majorly it will be JSON object, whole JSON object will be added to the database and as part of output/response database will return json objects..
  • Key-value: Key-value databases are once which are like a big hash map, where all data is stored as key and value. It works as session maintaining container, but has extra value as it can persist
  • Graph: These databases can store graphical data, which are like hierarchy data, where data is more of the node based.
  1. Column: Accumulo, Cassandra, Druid, HBase, Vertica
  2. Document: Apache CouchDB, Clusterpoint, Couchbase, DocumentDB, HyperDex, Lotus Notes, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB
  3. Key-value: Aerospike, Couchbase, Dynamo, FairCom c-treeACE, FoundationDB, HyperDex, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Riak, Berkeley DB
  4. Graph: AllegroGraph, InfiniteGraph, MarkLogic, Neo4J, OrientDB, Virtuoso, Stardog
  5. Multi-model: Alchemy Database, ArangoDB, CortexDB, FoundationDB, MarkLogic, OrientDB

Where to use No-SQL databases?

No-SQL databases are used main when the data is huge and needs 100% availability. These databases are mainly used for report generation and data analysis where data is huge and generate report using it. As the scalability is easy, it will be database keep growing like data capture for user action, capture log data, audit data.  As also used as back-end databases for micro-service architecture where each service need specific data. As some of the databases use JSON object, it is used for light weight web application where data is stored directly as JSON and read as JSON back. Some of the No-SQL databases (key-value type) are used to store and maintain session across multiple application servers.


Why is this part of ATG Blog?

Coming Soon.... “ATG to Cassandra (No-SQL) integration plugin code”.

3 comments: