Sunday, 14 June 2015

BIG Data and Hadoop


Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using simple programming model. Hadoop is an open-source Data Management with scale-out storage and distributed processing.

  • Framework is nothing but set of libraries.
  • Cluster means collection of systems.
  • Commodity computer means low end configured systems.
  • Simple programming model means Map Reduce or YARN framework.
  • Open-source means we can download the binaries\exe in free of cost and only pay for support if requires.
  • Scale-up means adding additional resources in the same machine in order to upgrade the system resources.
  • Scale-out means adding additional systems in the cluster in order to upgrade the system resources.


In Hadoop, we are using Distribute File System(DFS), in which input data will be divided into equal chunks that will be distributed across the machines in the cluster.