Steps to setup Apache Hadoop Cluster 2.0 with HDFS High Availability Feature

HDFS High Availability(HA) feature is to implement automatic failover process. As part of HA, we setup two master machines that are Active and stand-by name node. All namespace edits(meta data) are logged into shared edit logs (NFS Storage) which kept in common shared location. Active name node writes all the meta data into shared edit logs. Only active name has write access on shared NFS Storage which is called as single writer(fencing).

The Active Name Node is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast fail-over if necessary. Stand by name node has read-only access on shared edit logs which reads edit logs and applies to its own namespace.

Zookeeper is the co-ordination system which monitors active and standby name node through heartbeat signal. Whenever active name node gets down, then zookeeper will automatically enable standby name node as active name node and inform to all the slave machines to connect to new active name node. This way, we can make sure that our system is available at all the time.

Let us see how we can setup HA feature as part of Apache Hadoop cluster 2.0.

We can set up the HA in two different ways:

Using the Quorum Journal Manager (QJM)
Using NFS for the shared storage

In this article, let us see, Steps to set up using the Quorum Journal Manager (QJM) to share and edit logs between the Active and Standby Name Nodes.

Step-1: First, setup host name for each nodes as below.

Step-2: Download the Hadoop and Zookeeper binary tar file and extract to edit the configurations.

Download the zookeepr binary file.

>> wget http://www.interior-dsgn.com/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz

Un-tar the downloaded zookeepr binary tar file.

>> tar -xvf zookeeper-3.4.6.tar.gz

Download the stable Hadoop binary tar file from Hadoop site.

wget http://mirrors.advancedhosters.com/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

Extract the Hadoop binary tar file

>> tar –xvf hadoop-2.6.0.tar.gz

Step-3: Update the .bashrc file to add the Hadoop and Zookeeper file path.

>> sudo gedit ~/.bashrc

Update/Add the below content.

Step-4: Enable and generate SSH in all the nodes.

>> ssh-keygen –t rsa

Don’t give any path to the Enter file to save the key and don’t give any pass phrase. Press enter button.

Generate the ssh key process in all the nodes.

Once ssh key is generated, you will get the public key and private key.

The .ssh key Directory should contain the Permission 700 and all the keys inside the .ssh directory should contain the permissions 600.

>> sudo chmod 700 .ssh/

>> cd .ssh/

>> chmod 600 *

We have to copy the name nodes ssh public key to all the nodes. In active name node, copy the id_rsa.pub using cat command as below.

>>cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Copy the name node public key to all the nodes using ssh-copy-id command

>> ssh-copy-id –i .ssh/id_rsa.pub user@nn2.cluster.com

>> ssh-copy-id –i .ssh/id_rsa.pub user@dn1.cluster.com

Restart the sshd service in all the nodes.

>> sudo service sshd restart

Now we can login to the any node from Name node without any authentication.

Step-5 [ Update Hadoop configuration files ]

Open the core-site.xml file from the Active Name node and add the below properties.

Add the below properties in HDFS-site.xml.

<property>

<name>dfs.namenode.name.dir</name>

<value>/home/user/HA/data/namenode</value>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

<property>

<name>dfs.nameservices</name>

<value>ha-cluster</value>

</property>

<property>

<name>dfs.ha.namenodes.ha-cluster</name>

<value>nn1,nn2</value>

</property>

<property>

<name>dfs.namenode.rpc-address.ha-cluster.nn1</name>

<value>nn1.cluster.com:9000</value>

</property>

<property>

<name>dfs.namenode.rpc-address.ha-cluster.nn2</name>

<value>nn2.cluster.com:9000</value>

</property>

<property>

<name>dfs.namenode.http-address.ha-cluster.nn1</name>

<value>nn1.cluster.com:50070</value>

</property>

<property>

<name>dfs.namenode.http-address.ha-cluster.nn2</name>

<value>nn2.cluster.com:50070</value>

</property>

<property>

<name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://nn1.cluster.com:8485;nn2.cluster.com:8485;dn1.cluster.com:8485/ha-cluster</value>

</property>

<property>

<name>dfs.client.failover.proxy.provider.ha-cluster</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<property>

<name>dfs.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

<property>

<name>ha.zookeeper.quorum</name>

<value> nn1.cluster.com:2181,nn2.cluster.com:2181,dn1.cluster.com:2181 </value>

</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/user/.ssh/id_rsa</value>

</property>

Update configuration for Zookeeper now.

>> cd zookeeper-3.4.6/conf

In a conf directory we have zoo_sample.cfg file, create the zoo.cfg using zoo_sample.cfg file.

>> cp zoo_sample.cfg zoo.cfg

Create the directory in any location and use this directory to store the zookeeper data.

>> mkdir <path, where you want to store the zookeeper files>

>> mkdir zookeeper

Open the zoo.cfg file.

>> gedit zoo.cfg

Add the directory path that is created in above step to the dataDir property and add the below details regarding remaining node, in the zoo.cfg file.

dataDir=/home/user/HA/data/zookeeper

Server.1=nn1.cluster.com:2888:3888

Server.2=nn2.cluster.com:2888:3888

Server.3=dn1.cluster.com:2888:3888

Now copy the Java and Hadoop-2.6.0, zookeeper-3.4.6 directories, and .bashrc file to all the nodes (Standby name node, Data node) using scp command and change the environment variables in each according to the respective node.

>> scp –r <path of directory> user@<ip address>:<path where you need to copy>

In a data node(@Dn1) create any directory where we need to store the HDFS blocks.

In a data node(@Dn1) we have to add the dfs.datanode.data.dir properties.

>> mkdir datanode

Change the permission to data node directory.

>> chmod 755 datanode/

Open the HDFS-site.xml file, add this Datanode directory path in dfs.datanode.data.dir property.

We should Keep all the properties that are copied from the Active name node; add dfs.datanode.data.dir one extract property in namenode.

In Active name node, change the directory where we want to store the zookeeper configuration file (data Dir property path).

Create the myid file inside the directory and add numeric 1 to the file and save the file.

>> pwd

/home/user/HA/data/zookeeper

>> gedit myid

In a standby namenode change the directory where we want to store the zookeeper configuration file (data Dir property path). Create the myid file inside the directory and add numeric 2 to the file and save the file.

In a data node, change the directory where we want to store the zookeeper configuration file (data Dir property path). Create the myid file inside the directory and add numeric 3 to the file and save the file.

Step-6 [Start the Journal node in all three nodes]

>> hadoop-daemon.sh start journalnode

>> sudo jps

Step-7 [Format and start the Active name node]

>> HDFS namenode -format

>> hadoop-daemon.sh start namenode

Step-8 [Command from Standby Name node]

Copy the HDFS Meta data from active name node to standby name node.

>> HDFS namenode -bootstrapStandby

Once we run this command, we will get the information from which node and location the meta data is copying and whether it is copying successfully or not.

Start the name node daemon in Standby name node machine.

>> hadoop-daemon.sh start namenode

Step-9 [Start Zookeeper service in all three nodes]

Run below command in all three nodes.

>> zkServer.sh start

After running the Zookeeper server, enter JPS command. In all the nodes we will see the QuorumPeerMain service.

We can check the status of zookeeper in all the nodes run in the command below.

>> zkServer.sh status

It will give the follower[@Active] or Leader[@Standby] message.

Step-10 [Start Data node daemon in Data node machine]

>> hadoop-daemon.sh start datanode

Step-11 [Format and Start the Zookeeper fail over controller in Active and Standby name node]

Format the zookeeper fail over controller in Active name node.

>> HDFS zkfc –formatZK

Start the ZKFC in Active namenode.

>> hadoop-daemon.sh start zkfc

Enter jps command to check the DFSZkFailoverController daemons.

Step-12 [Verify the status at Active and Standby name node]

Check the status of each Namenode, which node is Active or which node is on Standby by using the below command.

>> hdfs haadmin –getServiceState nn1

Now Check the status of each Name node using the web browser.

Open the Web browser and enter the below URL.

<IP Address of Active Namenode>:50070

It will show whether the name node is Active or on standby.

Finally make sure proper daemons are running on each systems.

The daemons in Active Name Node are:

Zookeeper
Zookeeper Fail Over controller
Journal Node
NameNode

The daemons in Standby Name node are:

Zookeeper
Zookeeper Fail Over controller
Journal Node
Name Node

The daemons in data node are:

Zookeeper
Journal Node
Data node

In order to verify the manual fail over process, in the Active name node, kill the name node daemon to change the Standby name node to active name node.

>> sudo kill -9 <namenode process ID>

Open the two nodes through web browser and check the status.

Techno Solutions for BIG Data Analytic and SQL Server

Friday, 19 June 2015

Steps to setup Apache Hadoop Cluster 2.0 with HDFS High Availability Feature

No comments:

Post a Comment