Steps to install and configure Apache HIVE

Apache Hadoop is data warehouse environment which is built on top of Hadoop. It is developed by Facebook and released as open source to the community. Hive uses SQL like HiveQL(Hive Query Language) for the Big data processing.

During execution, HiveQL is converted into series of Map Reduce code which will be executed on top of Hadoop cluster.

Let us go through steps to install and configure Apache HIVE. Hadoop should be installed & configured before HIVE setup.

Step:1 [Download and extract the HIVE tar file]

>> wget -c http://archive.apache.org/dist/hive/stable/apache-hive-1.2.0-bin.tar.gz

>> tar -xzvf apache-hive-1.2.0-bin.tar.gz

Step:2 [Edit the .bashrc file for environment variables]

Add the following at the end of the file:

export HIVE_HOME=/usr/lib/apache-hive-1.2.0-bin

export PATH=$PATH:$HIVE_HOME/bin

Step:3 [Create and configure HIVE directory within HDFS]

>> hadoop fs -mkdir /user/hive/warehouse

The directory "warehouse" is the location to store the table or data related to hive.

>> hadoop fs -mkdir /temp

The temporary directory "temp" is the temporary location to store the intermediate result of processing.

Set read/write permissions for the HIVE directories.

In this command we are giving written permission to the group:

>> hadoop fs -chmodg+w /user/hive/warehouse

>> hadoop fs -chmodg+w /temp

Step:4 [Update Hadoop path in hive config files]

>> sudo gedit hive-config.sh

export HIVE_CONF_DIR=$HIVE_CONF_DIR

export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH

export HADOOP_HOME=<Your Hadoop Home dir>

>> cd $HIVE_HOME/conf

>> cp hive-env.sh.template hive-env.sh

>> sudo gedit hive-env.sh

#Append the below line.

export HADOOP_HOME=<Your Hadoop Home dir>

Hive configuration is completed now. If we require external database server to configure meta store, then we use Apache Derby database.

Step:5 [Install and Configure Apache Derby]

>> wget http://archive.apache.org/dist/db/derby/db-derby-10.4.2.0/db-derby-10.4.2.0-bin.tar.gz

>> tar zxvf db-derby-10.4.2.0-bin.tar.gz

>> mv db-derby-10.4.2.0-bin /usr/lib/derby

Use "su -" command in case if requires super user for copying files.

Step:6 [Setup environment variable for Derby]

Append the below lines at .bashrc file.

>> export DERBY_HOME=/usr/lib/derby

>> export PATH=$PATH:$DERBY_HOME/bin

>> export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar

Execute the ~/.bashrc file

>> source ~/.bashrc

Step:7 [Create directory to store Meta store]

>> mkdir $DERBY_HOME/data

Step:8 [Configuring Meta store of HIVE]

Specify to hive where the database is stored. In order to do this, edit the hive-site.xml, which is in the $HIVE_HOME/conf directory.

First, copy the template file using following command.

>> cd $HIVE_HOME/conf

>> cp hive-default.xml.template hive-site.xml

Append the below lines between <configuration> and </configuration> at hive-site.xml

<property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:derby://localhost:1433/metastore_db;create=true </value>

<description>JDBC connect string for a JDBC metastore </description>

</property>

Create a file named jpox.properties and include the below lines in it.

javax.jdo.PersistenceManagerFactoryClass =

org.jpox.PersistenceManagerFactoryImpl

org.jpox.autoCreateSchema = false

org.jpox.validateTables = false

org.jpox.validateColumns = false

org.jpox.validateConstraints = false

org.jpox.storeManagerType = rdbms

org.jpox.autoCreateSchema = true

org.jpox.autoStartMechanismMode = checked

org.jpox.transactionIsolation = read_committed

javax.jdo.option.DetachAllOnCommit = true

javax.jdo.option.NontransactionalRead = true

javax.jdo.option.ConnectionDriverName = org.apache.derby.jdbc.ClientDriver

javax.jdo.option.ConnectionURL = jdbc:derby://hadoop1:1433/metastore_db;create = true

javax.jdo.option.ConnectionUserName = APP

javax.jdo.option.ConnectionPassword = mine

Step:9 [Verifying Hive installation]

Use the below command to get into the hive CLI prompt and to check the available database\table list.

>> hive

>> show tables;

>> show databases;

>> quit;

Techno Solutions for BIG Data Analytic and SQL Server

Friday, 19 June 2015

Steps to install and configure Apache HIVE

No comments:

Post a Comment