Friday, 19 June 2015

Steps to Install and configure Apache PIG



PIG is a data flow language. It uses PIG LATIN language for BIG Data processing. PIG Latin is high level commands\operators which are very easy to learn and understand. It is mostly useful for non-Java developers for Big data processing.

As part of execution, PIG execution engine undergoes below mentioned conversion steps.
  • Logical Plan
  • Physical Plan
  • Map Reduce Plan

Let us see how to install and configure PIG on Apache Hadoop 2.0 cluster.

Step: 1 [Download the stable version]
Download the stable version from below link.


Release notes link


Step:2 

Copy the downloaded package to /usr/lib directory

Step:3 [Unzip and change the owner]

>> sudo tar xzf pig-0.15.0.tar.gz

>> sudo mv pig-0.15.0 pig

>> sudo chown -R huser:hadoop pig

chown command change the owner of the directory pig from root to hadoop user "huser".


Step:4 [Login to Hadoop user "huser" and set the environment variables]

>> su – hduser

Add the below two lines in ~/.bashrc file.

export PIG_HOME=”/usr/lib/pig”
export PATH=$PATH:$PIG_HOME/bin

Step:5 [Source the profile file to reflect the changes]

>> . .bashrc

Step:6 [Verify the PIG command]

>> pig -help