Team ,Greetings to you .
I was looking for a RDBMS which is efficient in many aspects ,say from dawn to dusk. I came across something called NO SQL. So what does that mean. Will it solve the common problem areas of a big web application. Lets see what it can offer us.
NO SQL an Introduction :
"For certain classes of applications, a non-SQL model is perfectly valid, and in a cloud-computing environment, it [can be] necessary.
A relational database (RDB) like SQL can be most easily described as a table-based data system where there is minimal data duplication and sets of data can
be accessed through a series of relational operators like joins and unions. The problem with such relations is that complex operations with large data sets quickly become prohibitively resource intense, although generally the benefits are reaped at the application level where database code need not be convoluted.
why are relational databases just now becoming an annoyance?
As the web has grown more social, however, more and more it’s the people themselves who have become the publishers. And with that fundamental shift away from read-heavy architectures to read/write and write-heavy architectures, a lot of the way that we think about storing and retrieving data needed to change.
NoSQL: non-relational data stores that “provide for web-scale data storage and retrieval especially in web based applications because it views the data more closely to how web apps view data – a key/value hash in the sky.” NoSQL is meant for the current growing breed of web applications that need to scale effectively. Applications can horizontally scale on clusters of commodity hardware without being subject to intricate sharding techniques.
NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontal scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above.
Cassandra NoSQL Database an Apache Top Level Project
Cassandra was born out of Facebook's need to store reverse indices of Facebook messages that users send and receive while communicating with their friends. The solution needed to scale incrementally while remaining cost effective. Traditional data storage was not an option, so Facebook created a non-relational solution called "Cassandra". The project was designed by Avinash Lakshman and Prashant Malik. Lakshman was one of the authors of Amazon's Dynamo, another large-scale NoSQL database. In many ways, Cassandra is like the second version of Dynamo, or a marriage of Dynamo and Google's BigTable. Lakshman further describes Cassandra's data model and the distributed properties provided by the system:
Data Model
Every row is identified by a unique key. The key is a string and there is no limit on its size.
An instance of Cassandra has one table which is made up of one or more column families as defined by the user.
The number of column families and the name of each of the above must be fixed at the time the cluster is started. There is no limitation the number of column families but it is expected that there would be a few of these.
Each column family can contain one of two structures: supercolumns or columns. Both of these are dynamically created and there is no limit on the number of these that can be stored in a column family.
Columns are constructs that have a name, a value and a user-defined timestamp associated with them. The number of columns that can be contained in a column family is very large. Columns could be of variable number per key. For instance key K1 could have 1024 columns/super columns while key K2 could have 64 columns/super columns.
“Supercolumns” are a construct that have a name, and an infinite number of columns assosciated with them. The number of “Supercolumns” associated with any column family could be infinite and of a variable number per key. They exhibit the same characteristics as columns.
Distribution, Replication and Fault Tolerance
Data is distributed across the nodes in the cluster using Consistent Hashing based and on an Order Preserving Hash function. We use an Order Preserving Hash so that we could perform range scans over the data for analysis at some later point.
Cluster membership is maintained via Gossip style membership algorithm. Failures of nodes within the cluster are monitored using an Accrual Style Failure Detector.
High availability is achieved using replication and we actively replicate data across data centers. Since eventual consistency is the mantra of the system reads execute on the closest replica and data is repaired in the background for increased read throughput.
System exhibits incremental scalability properties which can be achieved as easily as dropping nodes and having them automatically bootstrapped with data.
Think of Cassandra as a large 4 or 5 level associative array. Each dimention of the array has a free index that is based on the keys in that level. The optional 5th level is the Supercolumn, which is where the real power comes from. It can allow a simple key-value architecture to deal with sorted lists based on a specified index. Cassandra has no single points of failure and it is able to scale from one node to several thousand in different data centers. There is no central master, so data can be written to any node in the cluster and read from any other node. Cassandra can be tuned to support more consistency or availability depending on your application. There's also a high availability guarantee where if one node goes down, another one will step in and replace it smoothly.
No comments:
Post a Comment