Hadoop的安装和配置_Hadoop的安装和配置文件
1.创建Hadoop用户(在所有的机器上都要做一遍)
#useradd –m hadoop
#passwd hadoop
2.使master可以以无密码的方式ssh登录到slaves上
在master和slaves上
# vi /etc/ssh/sshd_config
RSAAuthentication yes
PubkeyAuthentication yes
# /etc/init.d/sshd restart
#su – hadoop
$ ssh-keygen -t rsa
把master的public key拷贝到它自己和slaves机器上,把slaves上的public key拷贝到master上
在master上
$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712e4.ipt.aol.com
$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712e5.ipt.aol.com
$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712 e6.ipt.aol.com
$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712 e7.ipt.aol.com
在slaves上
$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712e4.ipt.aol.com
$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712e5.ipt.aol.com
$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712 e6.ipt.aol.com
$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712 e7.ipt.aol.com
3.下载和安装JDK
$wget
http://download.oracle.com/otn-pub/java/jdk/7u3-b04/jdk-7u3-linux-i586.tar.gz
$tar -xvf jdk-7u3-linux-i586.tar.gz
同步jdk1.7.0_03文件夹:把hadoop文件夹复制到slaves节点的相同位置
$scp -r jdk1.7.0_03 aca712e5.ipt.aol.com:~
$scp -r jdk1.7.0_03 aca712e6.ipt.aol.com:~
$scp -r jdk1.7.0_03 aca712e7.ipt.aol.com:~
4.下载并安装Hadoop(只需要在master上做即可,最后可以同步文件夹到slaves上)
$wget http://labs.renren.com/apache-mirror/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz
$tar -xzf hadoop-0.20.2.tar.gz
$mv Hadoop-0.20.2 hadoop
5.添加java环境变量
$vi .bashrc
# .bashrc
# Source globaldefinitions
if [ -f /etc/bashrc]; then
. /etc/bashrc
fi
# User specificaliases and functions
export JAVA_HOME=/home/hadoop/jdk1.7.0_03
export HADOOP_HOME=/home/hadoop/hadoop/
exportCLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib
exportPATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
$source ~/.bashrc
$java -version查看
同步 .bashrc文件夹:把hadoop文件夹复制到slaves节点的相同位置
$scp -r .bashrc aca712e5.ipt.aol.com:~
$scp -r .bashrc aca712e6.ipt.aol.com:~
$scp -r .bashrc aca712e7.ipt.aol.com:~
6.设置conf/*里的文件
conf/hadoop-env.sh修改HADOOP_PID_DIR变量的值
exportHADOOP_PID_DIR=${HADOOP_HOME}/pids(需要自己创建该目录文件pids)
配置namenode的IP和端口
conf/core-site.xml添加fs.default.name
-bash-3.2$ catcore-site.xml
<?xmlversion=”1.0″?>
<?xml-stylesheettype=”text/xsl” href=”configuration.xsl”?>
<!– Putsite-specific property overrides in this file. –>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://172.167.18.228:9000</value> //namenode的公网IP地址
</property>
</configuration>
配置Jobtracker的IP和端口
conf/mapred-site.xml添加mapred.job.tracker
-bash-3.2$ catmapred-site.xml
<?xmlversion=”1.0″?>
<?xml-stylesheettype=”text/xsl” href=”configuration.xsl”?>
<!– Putsite-specific property overrides in this file. –>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>172.167.18.228:9001</value> //Jobtracker的公网IP地址
</property>
</configuration>
配置数据备份数量
站点节点配置:hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/data</value>
</property>
<property>
<name>dfs.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
conf/slaves添加slaves机器名
-bash-3.2$ catslaves
aca712e5.ipt.aol.com
aca712e6.ipt.aol.com
aca712e7.ipt.aol.com
conf/masters添加secondary master
-bash-3.2$ catmasters
aca712e4.ipt.aol.com //最好使用这种地址,不用localhost,便于后面同步到其他机器
同步hadoop文件夹:把hadoop文件夹复制到slaves节点的相同位置
$scp -r hadoopaca712e5.ipt.aol.com:~
$scp -r hadoopaca712e6.ipt.aol.com:~
$scp -r hadoopaca712e7.ipt.aol.com:~
7.在master上格式化hdfs
-bash-3.2$bin/hadoopnamenode -format
注意:重复格式化会失败,需要重新格式化需要删除namenode的name目录及下所有文件,并删除datanode的data目录及下所有文件
8.在master上启动hdfs和mapreduce,自动生成日志文件
-bash-3.2$bin/start-all.sh
注意:需要重启应用时,先停止所有应用,而且要删除namenode和datanode上的logs目录下的文件
检查集群状态
[hadoop@aca712e4 bin]$ ./hadoop dfsadmin -report
9.web界面访问hdfs和Jobtracker
http://172.167.18.228:50070/dfshealth.jsp
http://172.167.18.228:50030/jobtracker.jsp
10.把WordCount.java放到主节点的家目录下
package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class wordcount{
public static class Map extends MapReduceBase implementsMapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = newIntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implementsReducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,OutputCollector<Text, IntWritable> output, Reporter reporter) throwsIOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(wordcount.class);
conf.setJobName(“wordcount”);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
11.编译WordCount.java,并打包生成的文件到wordcount.jar
$ mkdirwordcount_classes
$javac -classpathhadoop/hadoop-0.20.2-core.jar -d wordcount_classes wordcount.java
$ jar -cvfwordcount.jar -C wordcount_classes/ .
12.准备输入文件夹
$ ./hadoop fs-mkdir input
$ echo “HelloWorld Bye World”>>file01
$ echo “HelloHadoop Bye Hadoop”>>file02
$ ./hadoop fs -putfile01 input/
$ ./hadoop fs -putfile02 input/
$ ./hadoop fs -lsinput/
Found 2 items
-rw-r–r– 2hadoop supergroup 22 2011-11-2819:10 /user/hadoop/input/file01
-rw-r–r– 2 hadoop supergroup 24 2011-11-28 19:11/user/hadoop/input/file02
13.运行WordCount
$ ./hadoop jar/home/hadoop/wordcount.jar org.myorg.wordcount input output
14.查看结果
$ ./hadoop fs -lsoutput
Found 2 items
drwxr-xr-x – hadoop supergroup 0 2011-11-28 19:16/user/hadoop/output/_logs
-rw-r–r– 2 hadoop supergroup 31 2011-11-28 19:16/user/hadoop/output/part-00000
$ ./hadoop fs -catoutput/part-00000
Bye 2
Hadoop 2
Hello 2
World 2
转载请注明:数据分析 » Hadoop的安装和配置_Hadoop的安装和配置文件