What is Big data ?

Big Data is a term that represents datasets whose size is beyond the capacity of commonly used software tools to manage and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single dataset.

It is the term of a collection of data sets, so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

Big data is term which defines three characteristics

  1. Volume
  2. Velocity
  3. Variety

Already we have RDBMS to store and process structured data. But of late we have been getting data in the form of videos, images, and text. This data is called as unstructured data and semi-structured data. It is difficult to efficiently store and process these data using RDBMS.

So definitely we have to find an alternative way to store and to process this type of unstructured and semistructured data.

Hadoop is one of the technologies to efficiently store and to process a large set of data. This Hadoop is entirely different from Traditional distributed file system. It can overcome all the problems exits in the traditional distributed systems.

Hadoop is an open source framework written in Java for strong data in a distributed file system and processing the data in a parallel manner across the cluster nodes.


