Friday, 19 June 2015

Steps to install and configure Apache HIVE


Apache Hadoop is data warehouse environment which is built on top of Hadoop. It is developed by Facebook and released as open source to the community. Hive uses SQL like HiveQL(Hive Query Language) for the Big data processing.

During execution, HiveQL is converted into series of Map Reduce code which will be executed on top of Hadoop cluster.

Let us go through steps to install and configure Apache HIVE. Hadoop should be installed & configured before HIVE setup.

Step:1 [Download and extract the HIVE tar file]

>> wget -c http://archive.apache.org/dist/hive/stable/apache-hive-1.2.0-bin.tar.gz

>>  tar -xzvf apache-hive-1.2.0-bin.tar.gz


Step:2 [Edit the .bashrc file for environment variables]

Add the following at the end of the file:
export HIVE_HOME=/usr/lib/apache-hive-1.2.0-bin
export PATH=$PATH:$HIVE_HOME/bin

Step:3 [Create and configure HIVE directory within HDFS]
>> hadoop fs -mkdir /user/hive/warehouse
The directory "warehouse" is the location to store the table or data related to hive.
>> hadoop fs -mkdir /temp
The temporary directory "temp" is the temporary location to store the intermediate result of processing.
Set read/write permissions for the HIVE directories.
In this command we are giving written permission to the group:
>> hadoop fs -chmodg+w /user/hive/warehouse 
>> hadoop fs -chmodg+w /temp

Step:4 [Update Hadoop path in hive config files]
>> sudo gedit hive-config.sh
export HIVE_CONF_DIR=$HIVE_CONF_DIR
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH
export HADOOP_HOME=<Your Hadoop Home dir>
>> cd $HIVE_HOME/conf
>> cp hive-env.sh.template hive-env.sh
>> sudo gedit hive-env.sh
#Append the below line.
export HADOOP_HOME=<Your Hadoop Home dir>
Hive configuration is completed now. If we require external database server to configure meta store, then we use Apache Derby database.

Step:5 [Install and Configure Apache Derby]
>> tar zxvf db-derby-10.4.2.0-bin.tar.gz
>> mv db-derby-10.4.2.0-bin /usr/lib/derby
Use "su -" command in case if requires super user for copying files.

Step:6 [Setup environment variable for Derby]
Append the below lines at .bashrc file.
>> export DERBY_HOME=/usr/lib/derby
>> export PATH=$PATH:$DERBY_HOME/bin
>> export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar
Execute the ~/.bashrc file
>> source ~/.bashrc

Step:7 [Create directory to store Meta store]
>> mkdir $DERBY_HOME/data

Step:8 [Configuring Meta store of HIVE]
Specify to hive where the database is stored. In order to do this, edit the hive-site.xml, which is in the $HIVE_HOME/conf directory.
First, copy the template file using following command.
>> cd $HIVE_HOME/conf
>> cp hive-default.xml.template hive-site.xml
Append the below lines between <configuration> and </configuration> at hive-site.xml
<property>
   <name>javax.jdo.option.ConnectionURL</name>
   <value>jdbc:derby://localhost:1433/metastore_db;create=true </value>
   <description>JDBC connect string for a JDBC metastore </description>
</property>

Create a file named jpox.properties and include the below lines in it.
javax.jdo.PersistenceManagerFactoryClass =
org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema = false
org.jpox.validateTables = false
org.jpox.validateColumns = false
org.jpox.validateConstraints = false
org.jpox.storeManagerType = rdbms
org.jpox.autoCreateSchema = true
org.jpox.autoStartMechanismMode = checked
org.jpox.transactionIsolation = read_committed
javax.jdo.option.DetachAllOnCommit = true
javax.jdo.option.NontransactionalRead = true
javax.jdo.option.ConnectionDriverName = org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL = jdbc:derby://hadoop1:1433/metastore_db;create = true
javax.jdo.option.ConnectionUserName = APP
javax.jdo.option.ConnectionPassword = mine

Step:9 [Verifying Hive installation]
Use the below command to get into the hive CLI prompt and to check the available database\table list.
>> hive
>> show tables;
>> show databases;
>> quit;