29 Feb 2016

Understanding BIG DATA

MongoDB 0 Comment

What is  BigData:
         Big Data is nothing but an assortment of such a huge and complex data that it becomes very tedious to capture, store, process, retrieve and analyze it with the help of on-hand database management tools or traditional data processing techniques.

There are many real life examples of Big Data! Facebook is generating 500+ terabytes of data per day, NYSE (New York Stock Exchange) generates about 1 terabyte of new trade data per day, a jet airline collects 10 terabytes of censor data for every 30 minutes of flying time. All these are day to day examples of Big Data!

The three characteristics of Big data are Volume,Velocity,Variety

1. Volume : BIG DATA depends upon how gigantic it is. It could amount to hundreds of terabytes or even petabytes of information.For instance, 15 terabytes of facebook posts or 400 billion annual medical records could mean Big Data!

2. Velocity :Velocity means the rate at which data is flowing in the companies. Big data requires fast processing. Time factor plays a very crucial role in several organizations. For instance, processing 2 million records at share market or evaluating results of lakhs of students applied for competitive exams could mean Big Data!

3. Variety : Big Data may not belong to a specific format. It could be in any form such as structured, unstructured, text, images, audio, video, log files, emails, simulations, 3D models, etc. New research shows that a substantial amount of an organization’s data is not numeric; however, such data is equally important for decision-making process. So, organizations need to think beyond stock records, documents, personnel files, finances, etc.

What is the basic difference between traditional RDBMS and Hadoop?
           Traditional RDBMS is used for transactional systems to report and archive the data, whereas Hadoop is an approach to store huge amount of data in the distributed file system and process it. RDBMS will be useful when you want to seek one record from Big data, whereas, Hadoop will be useful when you want Big data in one shot and perform analysis on that later.