Purpose
This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).
Prerequisites
- GNU/Linux
Java
must be installed. Recommended Java versions are described at HadoopJavaVersionsssh
must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
A Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors
Installing Software
Followings steps are for: Ubuntu 14.04 LTS, Hadoop-2.6.0
TIP: Check 32-bit or 64-bit OS: getconf LONG_BIT
1
2
3
| sudo mkdir /usr/local/java
sudo cp jdk-7u45-linux-x64.gz /usr/local/java/
cd /usr/local/java/ && sudo tar -xzvf jdk-7u45-linux-x64.gz
|
1
2
3
4
5
6
7
8
| sudo vim /etc/profile
# /etc/profile
JAVA_HOME=/usr/local/java/jdk1.7.0_45
JRE_HOME=$JAVA_HOME/jre
PATH=$slug:$JAVA_HOME/bin:$JRE_HOME/bin
export JAVA_HOME
export JRE_HOME
export PATH
|
1
2
3
4
5
6
7
8
9
10
11
12
| # Inform your Ubuntu Linux system where your Oracle Java JDK/JRE is located.
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/local/java/jdk1.7.0_45/jre/bin/java" 1
sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_45/bin/javac" 1
sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/local/java/jre1.7.0_45/bin/javaws" 1
# Inform your Ubuntu Linux system that Oracle Java JDK/JRE must be the default Java.
sudo update-alternatives --set java /usr/local/java/jdk1.7.0_45/jre/bin/java
sudo update-alternatives --set javac /usr/local/java/jdk1.7.0_45/bin/javac
sudo update-alternatives --set javaws /usr/local/java/jdk1.7.0_45/bin/javaws
# Reload your system wide PATH /etc/profile
. /etc/profile
|
- Hadoop-2.6.0
- Unpack the downloaded Hadoop distribution.
1
2
3
4
| # vim etc/hadoop/hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/local/java/jdk1.7.0_45
|
Configuration
Now you are ready to start your Hadoop cluster in one of the three supported modes:
- Local (Standalone) Mode
- Pseudo-Distributed Mode
- Fully-Distributed Mode
OS
1
2
3
| # sudo vim /etc/hosts
# add new host
127.0.1.1 YARN001
|
Hadoop (Pseudo-Distributed Mode with YARN)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
| # vim etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
# vim etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://YARN001:8020</value>
</property>
</configuration>
# vim etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/anggao/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/anggao/hadoop/dfs/name</value>
</property>
</configuration>
# vim etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
# vim etc/hadoop/slaves
localhost
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
| # You can start all processes with start-all.sh, but better starts the process one by one
sbin/start-all.sh
# Start dfs or YARN with one command
sbin/start-dfs.sh
sbin/start-yarn.sh
# Step 1: format dfs
bin/hadoop namenode -format
# Step 2: start name node
sbin/hadoop-daemon.sh start namenode
# Step 3: Check running processes
jps
# Step 4: start data node
sbin/hadoop-daemon.sh start datanode
# Step 5: check dfs with UI
http://YARN001:50070/
# Step 6: format dfs, add a file
bin/hadoop fs -mkdir /home
bin/hadoop fs -mkdir /home/anggao
bin/hadoop fs -put README.txt /home/anggao
# Step 7: start YARN
sbin/start-yarn.sh
http://YARN001:8088
# Step 8: check mapreduce
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 100
# Step 9: stop dfs and YARN
sbin/stop-yarn.sh
sbin/stop-dfs.sh
|