Why hadoop?
8 reasons why organizations are adopting Hadoop are as follows:
1) Big data: Hadoop framework support extremely large data sets and hence used for Big data purpose. Hadoop can be used with structure, semi structured,or unstructured data. Big data can handle terabyte data volumes and is highly scalable.
2) Cost effective -Commodity Hardware: Hadoop framework runs on commodity hardware i.e. it does not need vendor specific hardware like tera data or Netezza. This makes it cheap and easy to upgrade or scale. Any computer can be made part of the hadoop cluster.
3) Scalability: Hadoop framework is highly scalable both vertically and horizontally. Vertical scalability refers to upgrading the machine by adding more hardware and horizontal scalability refers to adding more machines. Hadoop can be both vertically and horizontally scalable. More over the machines in the hadoop cluster can be heterogenous i.e they do not have to be from the company/vendor.
4) Parallel/distributed processing: Hadoop framework supports distributed processing. The data is distributed in the hadoop cluster consisting of many machines. The processing of data occurs at the same time on multiple machines using map reduce algorithms.
5) Redundancy: Any data file in hadoop is broken into small blocks and stored on multiple machine. Also, multiple copies which by default is 3 is stored on multiple machines. If any of the machine is down then the data can be fetched from the other machine.
6) Archiving and Exploratory analysis: Data from different data warehouses can be extracted and HDFS can be used to store those files for archiving purpose. Since the HDFS runs on commodity hardware it is cheap and effective way to archive data. Also, Hadoop can be used to store files for intial exploratory analysis before a proper data warehouses are built.
7) Streaming data: Hadoop tools such as apache flume can be used to read stream data and store those data in xml or json format directly on HDFS for analysis.
8) Unstructured data: Traditional data warehouses are very good in dealing with structured data.Hadoop is very good with even unstructured data that are in the form of flat files, xml, or json files.
Some of challenges faced by organizations in adopting hadoop:
a) Finding well trained and experienced professionals to build and develop on hadoop framework.
b) Finding suitable use cases and training the buisness to ask questions that hadoop can solve
Some of the big hadoop vendors are:
Cloudera
Hortonworks
IBM
ECM
Intel
MapR
Oracle
Check out other articles on same topic:
http://dwbitechguru.blogspot.ca/2014/11/how-to-load-local-data-file-into-apache_25.html