Introduction to NoSql

nosql_400x400

Why we need alternative to SQL ?

Developers are working with applications that create new, rapidly changing data types -structured,
semi-structured, unstructured and polymorphic data and massive volumes of it.

Problems with new, rapidly changing data types: The Data Types are ever changing as new requirement can come at any time.In relational Databases the you need to know what things are you storing in database like phone numbers, first and last name, address, city and state.This fits poorly with agile development approaches, because each time you complete new features, the schema of your database often needs to change.

Problem with scalability: The scalability is major pitfall when it comes to relational databases as they can only be vertically scaled that mean scalability can only is achieved by increasing the size of machine which has its own limitations. Besides the relational databases provides little support for running on distributed environment

Problem with Data Model:SQL APIs are difficult to implement for most kind of application as they is no natural mapping between the sql tables and programming objects, as data is fetched by joining different tables through foreign keys to map to programming objects

What are NoSql Databases ?

NoSql has no common definition.But they tend to have some common characteristics. NoSql instead of saving data in tables like relational databases , NoSql have different data models according to programming needs. NoSql are generally schema less and easily scalable across the cluster.

  • Schema Free
  • Easy replication support
  • Simple API
  • BASE / not ACID
  • Scalability
  • Mostly Open-Source

Schema Free: 

Most of NoSql are schema less. Records can add new information on the fly, and unlike SQL table rows, dissimilar data can be stored together as necessary. For some databases (e.g., wide-column stores), it is somewhat more challenging to add new fields dynamically.

Easy replication support:

Most NoSQL databases also support automatic replication, meaning that you get high availability and disaster recovery without involving separate applications to manage these tasks. The storage environment is essentially virtualized from the developer’s perspective.

Simple API:

Most NoSQL databases are also open-source, meaning that they can be downloaded, implemented and scaled at little cost. Because development cycles are faster, organizations can also innovate more quickly and deliver superior customer experience at a lower cost.  Most Nosql provides object-oriented APIs for data operation which are simple to use.

BASE / not ACID:
The BASE acronym was defined by Eric Brewer, who is also known for formulating the CAP theorem.
The CAP theorem states that a distributed computer system cannot guarantee all of the following three properties at the same time:

  • Consistency
  • Availability
  • Partition tolerance

A BASE system gives up on consistency.

  • Basically Available indicates that the system does guarantee availability, in terms of the CAP theorem.
  • Soft state indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model.
  • Eventual consistency indicates that the system will become consistent over time, given that the system doesn’t receive input during that time.

ACID vs BASE

ACID BASE
Atomic Basically Available
Consistency Soft state
Isolation Eventual consistency
Durable

BigTable, Cassandra, SimpleDB

Scalability:

NoSQL databases were developed from the ground up to be distributed, scale out databases that use a cluster of standard, physical or virtual servers to store data and support database operations.

To scale, additional servers are joined to the cluster, and the data and database operations are spread across the larger cluster. Since commodity servers are expected to fail from time-to-time, NoSQL databases are built to tolerate and recover from such failures, making them highly resilient.

NoSQL databases provide a much easier, linear approach to database scaling. If 10,000 new users start using your application, simply add another database server to your cluster. Add 10,000 more users and add another server.
There’s no need to modify the application as you scale since the application always sees a single (distributed) database.
While implementations differ, NoSQL databases share some characteristics with respect to scaling and performance:
Auto-sharding: A NoSQL database automatically spreads data across servers, without requiring applications to participate. Servers can be added or removed from the data layer without application downtime, with data (and I/O) automatically spread across the servers. Most NoSQL databases also support data replication, storing multiple copies of data across the cluster and even across data centers, to ensure high availability and support disaster recovery.
A properly managed NoSQL database system should never need to be taken offline, for any reason, supporting 24×365 continuous operation of applications.

Distributed query support: “Sharding” a relational database can reduce or eliminate the ability to perform complex data queries. NoSQL database systems retain their full query expressive power even when distributed across hundreds of servers.
Integrated caching: To reduce latency and increase sustained data throughput, advanced NoSQL database technologies transparently cache data in system memory.

This behavior is transparent to the application developer and the operations team, in contrast with relational technology where a caching tier is usually a separate infrastructure tier that must be explicitly managed by the ops team.

NoSql databases can be categorized on the basis on

  • Data Model
  • Query Model
  • Consistency Model

Data Model:

The primary way in which NoSql database differ from relational databases is the data model

Document Model:

As we know relational databases store data in rows, columns and tables , but document store data in documents. Document represents an object with multiple fields like string , number ,boolean and date stored mainly in JSON format.In documents complete aggregation is present at same place, unlike in relational databases where the data is aggregated by  by joining data from different table and columns through joins referred by foreign key which takes extra time to get results.

As no aggregation is  required so each each document can act as single entity , hence suitable to store on distributed environment.

eg. mongo db, couch db

Key-value:

In key-value databases the data is saved in form of map .Single key is needed to retrieve the data.These kinds of databases are suitable where data needed to be fetched from single key. Key-value stored are highly scalable amongst other NoSql types.

Wide column Model:

These databases saves data in sorted multidimensional maps.

Column-family databases store data in column families as rows that have many columns associated with a row key. Column families are groups of related data that is often accessed together. For a Customer, we would often access their Profile information at the same time, but not their Orders.

Each column family can be compared to a container of rows in an RDBMS table where the key identifies the row and the row consists of multiple columns. The difference is that various rows do not have to have the same columns, and columns can be added to any row at any time without having to add it to other rows.

When a column consists of a map of columns, then we have a super column. A super column consists of a name and a value which is a map of columns. Think of a super column as a container of columns.

Cassandra is one of the popular column-family databases; there are others, such as HBaseHypertable, and Amazon DynamoDB.

Graph Model:

These datatables are suitable where data have some relationships between them can be represented in form of graphs eg. social network.

Graph databases allow you to store entities and relationships between these entities. Entities are also known as nodes, which have properties. Think of a node as an in
stance of an object in the application. Relations are known as edges that can have properties. Edges have directional significance; nodes are organized by relationships which allow you to find interesting patterns between the nodes. The organization of the graph lets the data to be stored once and then interpreted in different ways based on relationships.

There are many graph databases available, such as Neo4JInfinite GraphOrientDB, or FlockDB

Leave a comment