Wednesday, April 24, 2013

show git branch on bash prompt

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi

# User specific aliases and functions
# show git branch
parse_git_branch() {
  git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/(\1)/'
}
export PS1="\[\033[00m\]\u@\h\[\033[01;33m\] \w \[\033[31m\]\$(parse_git_branch)\[\033[00m\]$\[\033[00m\] "

Monday, April 1, 2013

Setup Hadoop Cluster



Hadoop Cluster Setup in CentOS6

Requirement:
1.    Have java 1.6.x installed.
2.    Have ssh installed.

Installation & Configuratio[MUST be a root user]

1.    Download hadoop rpm file from apache hadoop official website.

2.    Install hadoop:
rpm –i hadoop_version.rpm

3.    Edit the file /etc/hosts on the servers:
192.168.1.40   master
192.168.1.41   slave1
192.168.1.42   slave2

4.    We must configure password less login from name node(master) to all data nodes (slave1 and slave2), on all servers do the following:
Ø  Command :ssh-keygen –tdsa
Ø  Keep press ENTER button until the id_dsa.pub file is generated.
We have 3 .pub files; one is on master, and others on the two slaves.

Copy the contents of those three .pub files to the authorized_keys file.
All servers authorized_keys file should have the same content.

5.    Open the file /etc/hadoop/hadoop-env.sh, and set the $JAVA_HOME: 
export JAVA_HOME=/usr/java/jdk1.6.0_38.

6.    Open the file /etc/hadoop/core-site.xml, add the following properties. This file is to configure the name node store information:




7.    Open the file /etc/hadoop/hdfs-site.xml and add the following properties:

8.    Open the file /etc/hadoop/mapred-site.xml, add the following properties. This file is to configure the host and port of the MapReduce jobtracker in the name node of the hadoop setup:




9.    Open the file /etc/hadoop/masters, add the namenode name: [NAMENODE SERVER ONLY]
master

10. Open the file /etc/hadoop/slaves, add all the datanodes names:[NAMENODE SERVER ONLY]
/* in case you want the namenode to also store data(i.e namenode also behave like a datanode) this can be mentioned in the salves file.*/
master
slave1
slave2

11. Modify files permissions.
Once Hadoop is installed, start-all.sh, stop-all.sh and several other files would be generated under /usr/sbin/, we must change all of those files permission:
           # sudo chmod a+x  file_name


Notice: Step 9 ,10 and 11 only for master server, the slaves should do nothing about those steps.

Start and Stop Hadoop Cluster (doing on hippo server)
1.    Formatting the namenode:
           # hadoop namenode –format
2.    Starting the Hadoop Cluster
           # start-all.sh

Run JPS command on master server:
          # jps 
            922 JobTracker
815 SecondaryNameNode
1062 TaskTracker
521 NameNode
1136 Jps

Run JPS command on slaves:
          # jps  
7407 DataNode
7521 TaskTracker
7583 Jps

3.    Checking the status of Hadoop Cluster:
(1)  Type the command :
hadoop dfsadmin –report
            (2) Browse the web interface for the NameNode (master server) and the JobTracker:
·      NameNode – http://192.168.1.40:50070/
·      JobTracker – http://192.168.1.40:50030/

4.    Process a sample to test Hadoop Cluster (wordcount example):
(1)  Create a directory in master server
mkdir input
          
(2) Create two test files under the ‘input’ directory and add the following text into the files
               echo "Hello haifzhan" >> text1.txt
               echo "Hello hadoop" >> text2.txt
               echo "Hello hadoop again" >> text3.txt

(3) Copy the two test files from master server to Hadoop’s HDFS
                  Under the ‘input’ directory:
                 # hadoop dfs -put ./   input
            (4)  Now you can check the files on Hadoop’s HDFS
                 # hadoop dfs -ls input/*
-rw-r--r--   2 root supergroup         15 2013-04-01 15:03 /user/root/input/text1.txt
-rw-r--r--   2 root supergroup         13 2013-04-01 15:03 /user/root/input/text2.txt
-rw-r--r--   2 root supergroup         19 2013-04-01 15:03 /user/root/input/text3.txt   

(5)  Run the MapReduce job
                 # hadoop jar /usr/share/hadoop/hadoop-example-1.0.3.jar wordcount input output
            (6)  Check the result
                 # hadoop dfs -cat output/part-r-00000
                        Hello  3
again  1
hadoop            2
haifzhan         1       
5.  Stopping the Hadoop Cluster
               # stop-all.sh

Other useful resources:
1. The logfiles locate in: /var/log/hadoop/root
2. Useful websites:
Error Solving:
1. Datanode: No route to host (start but then shut down automatically for a while)
           close the firewalls on both master and slaves machines
           # service iptables stop
2. Namenode: How to exit the safemode
           # hadoop dfsadmin -safemode leave
3. How to start datanode or tasktracker independently
            # hadoop-daemon.sh start datanode/tasktracker
4. How to check the current java version and the path of your local machine
            # echo $JAVA_HOME

5.  proccess information unavailable
remove all files under /tmp , reformate namenode and restart all servers.