Hadoop Installation on RHEL6/CentOS6

we concatenated the files to bring them close to and less than 64mb and the difference was huge without changing anything else we went from 214 minutes to 3 minutes !
— Elia Mazzaw

Hadoop Installtion on CentOS6.3

Prerequisite:

Red Hat Enterprise Linux 6 / CentOS6 (will worked on RHEL5 /CentOS5 too)

Download Hadoop1.0.4 tar ball  specific to your architecture from here.

For Other linux distributions Hadoop is available too, Choose as per your  requirement, for RedHat, centos and Novell Suse  you will need RPM. Ubuntu people deb is also here. But we will look into tar ball installation so that every Linux distribution is treated equally.

Here I am using CentOS 6.3 Linux.

Other then Hadoop tar ball (hadoop-1.0.4.tar.gz) we require :

Java 1.6 (at least), better to have 1.7.

I am using : jre1.6

[root@hadoopmaster html]# java -version

java version “1.6.0_37”

Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) Client VM (build 20.12-b01, mixed mode, sharing)

Here we are discussing this installation for 2 node hadoop cluster, So arrange two CentOS 6.3 Linux box (VM’s are good)

And setup everything on these two machines like hostname for server1 ie Master: hadoopmaster and for server2 ie slave: hadoopslave.

Also Update /etc/hosts file on both nodes, it will look something like this:

[hadoop@hadoopmaster bin]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.56.103 hadoopmaster
192.168.56.104 hadoopslave

Other than this make entry for your hostname under: /etc/sysconfig/network file

[hadoop@hadoopmaster bin]$ cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoopmaster

And HOSTNAME=hadoopslave on another machine.

This will setup HOSTNAME.

Adding a dedicated Hadoop system user

We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc).

#adduser hadoop

#passwd hadoop  < set your password here

After Creating hadoop user go to

#cd /hadoop/hadoop1.0.4/bin

#su hadoop

[hadoop@hadoopmaster bin]$

and add ssh keys for both nodes through hadoop user.

commands would be

#ssh_keygen

and

#ssh_copyid

and cross check the same by sshing in to both nodes. It should logged in without asking for any password.

Now create a directory:

#mkdir /hadoop

Put your downloaded hadoop tar ball here

Extract it here by

#tar -xvf  hadoop1.0.4.tar.gz (Yes I know you are smart enough to point out mistakes here bt here its working perfectly 🙂

Do all the same stuff on another hadoop machine too, which is going to be our hadoop salve.

Extract hadoop tar ball and go to :

#cd /hadoop/hadoop-1.0.4/conf/

Here we need to change following files on master node:

core-site.xml
hdfs-site.xml
mapred-site.xml
masters
slaves
hadoop-env.sh

And on slave node we only need to play with masters and slaves file.

So, Lets start with core-site.xml file here the BOLD and ITALIC sections need to be added.

[hadoop@hadoopmaster conf]$ cat core-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

<!– Put site-specific property overrides in this file. –>

<configuration>
             <property>
                      <name>fs.default.name</name>
                      <value>hdfs://hadoopmaster:54310</value>
            </property>
</configuration>

Then Go to hdfs-site.xml file

[hadoop@hadoopmaster conf]$ cat hdfs-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

<!– Put site-specific property overrides in this file. –>

<configuration>
               <property>
                        <name>dfs.replication</name>
                       <value>2</value>
            </property>
            <property>
                      <name>dfs.permission</name>
                      <value>false</value>
          </property>
</configuration>

3. Nest is mapred-site.xml file

[hadoop@hadoopmaster conf]$ cat mapred-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

<!– Put site-specific property overrides in this file. –>

<configuration>
         <property>
                 <name>mapred.job.tracker</name>
                <value>hadoopmaster</value>
       </property>
</configuration>

4: And the Important one

[hadoop@hadoopmaster conf]$ cat hadoop-env.sh
# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use. Required.
export JAVA_HOME=/usr          #<<<< Only this section needs to be edit..JAVA_HOME path

# Extra Java CLASSPATH elements. Optional.
<<snip >>

5: Go to Masters file…replace localhost entry with your Master’s hostname.

[hadoop@hadoopmaster conf]$ cat masters
hadoopmaster

6. and finally for slaves file

[hadoop@hadoopmaster conf]$ cat slaves
hadoopmaster
hadoopsalve

Add both nodes entry here.

On Slave node:

Update only slaves file .

So the configuration part is done. Now need to run some command to make Hadoop Up and Running:)

start with following:

[hadoop@hadoopmaster bin]$ ./hadoop version
Hadoop 1.0.4
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290
Compiled by hortonfo on Wed Oct 3 05:13:58 UTC 2012
From source with checksum xxxxxxxxxxxxxxxxxxxxxxxx

It might show some java error, so Please check JAVA_HOME defined under conf/hadoop_env.sh.

After that

[hadoop@hadoopmaster bin]$ ./hadoop namenode -format
12/12/13 07:47:37 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hadoopmaster/192.168.56.103
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.0.4
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by ‘hortonfo’ on Wed Oct 3 05:13:58 UTC 2012
************************************************************/
12/12/13 07:47:37 INFO util.GSet: VM type = 32-bit
12/12/13 07:47:37 INFO util.GSet: 2% max memory = 19.33375 MB
12/12/13 07:47:37 INFO util.GSet: capacity = 2^22 = 4194304 entries
12/12/13 07:47:37 INFO util.GSet: recommended=4194304, actual=4194304
12/12/13 07:47:37 INFO namenode.FSNamesystem: fsOwner=hadoop
12/12/13 07:47:37 INFO namenode.FSNamesystem: supergroup=supergroup
12/12/13 07:47:37 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/12/13 07:47:37 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/12/13 07:47:37 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/12/13 07:47:37 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/12/13 07:47:37 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/12/13 07:47:37 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
12/12/13 07:47:37 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoopmaster/192.168.56.103
************************************************************/

And Then finally:

[hadoop@hadoopmaster bin]$ ./start-dfs.sh
starting namenode, logging to /hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-namenode-hadoopmaster.out
hadoopsalve: ssh: Could not resolve hostname hadoopsalve: Temporary failure in name resolution
hadoopmaster: starting datanode, logging to /hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-datanode-hadoopmaster.out
hadoopmaster: starting secondarynamenode, logging to /hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-secondarynamenode-hadoopmaster.out
[hadoop@hadoopmaster bin]$ ./start-mapred.sh
starting jobtracker, logging to /hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-jobtracker-hadoopmaster.out
hadoopsalve: ssh: Could not resolve hostname hadoopsalve: Temporary failure in name resolution
hadoopmaster: starting tasktracker, logging to /hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-tasktracker-hadoopmaster.out

If every thing is fine then go to your browser and check for the running hadoop status over there:

Further, You will have a WebGUI to see your hard work of last 30 mins

Url would be : http://<masternodename/IP address>:50070

May be I have forgot few things, Please mention those on comments below… Hope this is useful 🙂

!! Enjoy your day !!

6 thoughts on “Hadoop Installation on RHEL6/CentOS6

  1. hi Yogesh. Thanks for this. I am stuck at one point in the process.

    [hadoop@hadoopmaster bin]$ ./hadoop version

    For me it is not working. It is showing the following:

    [hadoopM@hmaster bin]$ ./hadoop version
    ./hadoop: line 320: /usr/java/jdk1.7.0_21/bin/bin/java: No such file or directory
    ./hadoop: line 390: /usr/java/jdk1.7.0_21/bin/bin/java: No such file or directory
    ./hadoop: line 390: exec: /usr/java/jdk1.7.0_21/bin/bin/java: cannot execute: No such file or directory

    My Java version appears to be fine:

    [hadoopM@hmaster bin]$ java -version
    java version “1.7.0_21”
    Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
    Java HotSpot(TM) Server VM (build 23.21-b01, mixed mode)

    I have updated the bashrc and also tried etc/profile with JAVA_HOME path but for some reason I see the following:

    [hadoopM@hmaster bin]$ whereis java
    java: /usr/bin/java

    OR

    [hadoopM@hmaster bin]$ which java
    /usr/bin/java

    Is this related ?

    I’m close but not able to complete. Your help is appreciated.

    Thanks,
    Shashi R

    • Hi Shashi
      Its clearly visible on errors: ./hadoop: line 320: /usr/java/jdk1.7.0_21/bin/bin/java: No such file or directory, which is defined under hadoop configuration file. Fix this path error and only mention java, rest path is taken automatically.
      Hope this helps.

  2. HI,
    I am new in Hadoop and i am follwing urs link step by step.

    I followed urs link step by step …everything is okey but now i am stuck at last ….when i run ./start-dfs.sh it is still in process from yesterday night.

    I am able to browser via ..https://hadoopmaster:50070 but i am not able to see file system

    any help is appriciated…

    please reply me at vineet@apextsi.com

    once again thanks in advance …

    • Hello Vinit

      Probably you would be able to see the files on HDFS which you have allocated and files on that. If those files are not visible then there is an issue. If no files are visible though you have put those, try to cross check above steps, probably you have missed something.

Leave a reply to Yogesh Upadhyay Cancel reply