Monday, 15 June 2015

Features of Hadoop

  • Hadoop can process any kind of data that can be either structured or Un-structured format.
  • Hadoop uses low end configured systems(commodity computers) for its slave machines configuration.
  • Its is more cost effective for handling BIG Data.
  • Hadoop uses scale-out mechanism in which additional systems will be added in order to upgrade the resource requirement instead of upgrading the resources on the same machine.
Parallel Processing
  • Hadoop uses HDFS(Hadoop Distributed File System) for data storage in which data will be divided into multiple chunks and distributed across the machines in the cluster. So data will be processed in parallel against multiple chunks of data from multiple machines.
Automatic Failover
  • Hadoop implements HIGH Availability architecture in which Active and Stand-by systems are used. Active system will automatically failed over to the Stand-by system by zookeeper when primary system is down.
Taking processing to the data
  • In Hadoop, multiple instances of process will be initialized and deployed on system where actual chunk of data resides.
  • Here we don't have to move the huge volume of data from one place to other in order to get it processed.
  • Hadoop efficiently handles fault tolerant by maintaining replication factor for each chunk of data and each replication copy will be distributed at different systems.
  • So eventhough any one system got down, that will not interrupt the original process. Hadoop framework can pickup the same chunk of data from another machine for processing. This way, Hadoop can be defined as more reliable component.