Tuesday, 11 August 2015

How to customize HADOOP default block size


HADOOP stores files across the cluster by breaking them down into coarser grained, fixed size block. Default block size is 64 MB in HADOOP-1 and 128 MB in HADOOP-2 versions. If we need to change the default block size, then we update the value of dfs.blocksize property at hdfs-default.xml or hdfs-site.xml. Hdfs-site.xml overrides the settings at hdfs-default.xml.

<configuration>
        <property>
                <name>dfs.blocksize</name>

                <value>134217728</value>
        </property>
</configuration>

The default block size for new files, in bytes. We can use the following suffix (case insensitive): k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.), Or provide complete size in bytes (such as 134217728 for 128 MB).

Block size can be defined at Master system(Name node), slave system(Data node) and client side hdfs-site.xml or through commands submitted by the client.

Dfs.blocksize property definition in the client side hdfs-site.xml has the highest precedence over master and slave side settings. We can also define the block size from the client while uploading the files to HDFS as below.

HDFS dfs -D dfs.block.size=67108864 -put weblog.dat /hdfs/Web-data

Among master and slave node block size settings, master gets highest precedence than slave. If we would like to force to use the slave side dfs.blocksize definition, then include final=true value as below.

Hdfs-site.xml at slave node:

<configuration>
        <property>
                <name>dfs.blocksize</name>

                <value>134217728</value>
                <final>true</final>
        </property>
</configuration>