使用Cygwin模拟Linux环境,配置ssh以及认证就非常麻烦了,不过真要是走一遍那个流程,会学会不少东西的啊。
IBM的MapReduce Tools for Eclipse插件,极大地简化了这些配置,你可以想运行一个Java类一样轻松进行开发、调试和部署。
下载IBM的MapReduce Tools for Eclipse插件,地址是http://www.alphaworks.ibm.com/tech/mapreducetools,下载完成后,解压缩,将plugins目录下的文件夹拷贝到Eclipse目录下的plugins目录下,启动Eclipse,进行一番简单地配置就能进行Hadoop的开发、调试和部署了。
配置过程:
启动Eclipse,选择Window—>Preferences,弹出如图所示的对话框:
设置Hadoop Main Directory为自己下载的Hadoop发行包的解压包所在目录。设置完成后单击“OK”完成。
新建一个 Project ,选择MapReduce Project,如图所示:
继续进行创建,选择填写工程名后,完成一个MapReduce Project工程的创建,可以进行Hadoop程序的开发了。
比如,我直接把Hadoop自带的WordCount类程序一点不动地拷贝过来,修改包名。
然后进行运行时配置,选择Run As—>Open Debug Dialog选项,在Arguments选项卡中设置:
在其中填写两个目录,分别为数据输入目录和输出目录,中间用空格分隔:
G:hadoop-0.16.4in G:hadoop-0.16.4myout |
然后,就可以像运行一个Java程序一样运行了,控制台上打印出执行任务的信息,如下所示:
08/09/21 22:35:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 08/09/21 22:35:47 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 08/09/21 22:35:47 INFO mapred.FileInputFormat: Total input paths to process : 7 08/09/21 22:35:48 INFO mapred.JobClient: Running job: job_local_1 08/09/21 22:35:48 INFO mapred.MapTask: numReduceTasks: 1 08/09/21 22:35:49 INFO mapred.JobClient: map 0% reduce 0% 08/09/21 22:35:50 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/a.txt:0+1957 08/09/21 22:35:50 INFO mapred.TaskRunner: Task ‘job_local_1_map_0000’ done. 08/09/21 22:35:50 INFO mapred.TaskRunner: Saved output of task ‘job_local_1_map_0000’ to file:/G:/hadoop-0.16.4/myout 08/09/21 22:35:50 INFO mapred.MapTask: numReduceTasks: 1 08/09/21 22:35:50 INFO mapred.JobClient: map 100% reduce 0% 08/09/21 22:35:51 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/b.txt:0+10109 08/09/21 22:35:51 INFO mapred.TaskRunner: Task ‘job_local_1_map_0001’ done. 08/09/21 22:35:51 INFO mapred.TaskRunner: Saved output of task ‘job_local_1_map_0001’ to file:/G:/hadoop-0.16.4/myout 08/09/21 22:35:51 INFO mapred.MapTask: numReduceTasks: 1 08/09/21 22:35:51 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/c.txt:0+1957 08/09/21 22:35:51 INFO mapred.TaskRunner: Task ‘job_local_1_map_0002’ done. 08/09/21 22:35:51 INFO mapred.TaskRunner: Saved output of task ‘job_local_1_map_0002’ to file:/G:/hadoop-0.16.4/myout 08/09/21 22:35:51 INFO mapred.MapTask: numReduceTasks: 1 08/09/21 22:35:51 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/d.txt:0+1987 08/09/21 22:35:51 INFO mapred.TaskRunner: Task ‘job_local_1_map_0003’ done. 08/09/21 22:35:51 INFO mapred.TaskRunner: Saved output of task ‘job_local_1_map_0003’ to file:/G:/hadoop-0.16.4/myout 08/09/21 22:35:52 INFO mapred.MapTask: numReduceTasks: 1 08/09/21 22:35:52 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/e.txt:0+1957 08/09/21 22:35:52 INFO mapred.TaskRunner: Task ‘job_local_1_map_0004’ done. 08/09/21 22:35:52 INFO mapred.TaskRunner: Saved output of task ‘job_local_1_map_0004’ to file:/G:/hadoop-0.16.4/myout 08/09/21 22:35:52 INFO mapred.MapTask: numReduceTasks: 1 08/09/21 22:35:52 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/f.txt:0+1985 08/09/21 22:35:52 INFO mapred.TaskRunner: Task ‘job_local_1_map_0005’ done. 08/09/21 22:35:52 INFO mapred.TaskRunner: Saved output of task ‘job_local_1_map_0005’ to file:/G:/hadoop-0.16.4/myout 08/09/21 22:35:52 INFO mapred.MapTask: numReduceTasks: 1 08/09/21 22:35:53 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/g.txt:0+1957 08/09/21 22:35:53 INFO mapred.TaskRunner: Task ‘job_local_1_map_0006’ done. 08/09/21 22:35:53 INFO mapred.TaskRunner: Saved output of task ‘job_local_1_map_0006’ to file:/G:/hadoop-0.16.4/myout 08/09/21 22:35:53 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/b.txt:0+10109 08/09/21 22:35:54 INFO mapred.JobClient: map 28% reduce 0% 08/09/21 22:35:54 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/c.txt:0+1957 08/09/21 22:35:54 INFO mapred.LocalJobRunner: reduce > reduce 08/09/21 22:35:54 INFO mapred.TaskRunner: Task ‘reduce_xk6d4v’ done. 08/09/21 22:35:54 INFO mapred.TaskRunner: Saved output of task ‘reduce_xk6d4v’ to file:/G:/hadoop-0.16.4/myout 08/09/21 22:35:55 INFO mapred.JobClient: Job complete: job_local_1 08/09/21 22:35:55 INFO mapred.JobClient: Counters: 9 08/09/21 22:35:55 INFO mapred.JobClient: Map-Reduce Framework 08/09/21 22:35:55 INFO mapred.JobClient: Map input records=7 08/09/21 22:35:55 INFO mapred.JobClient: Map output records=3649 08/09/21 22:35:55 INFO mapred.JobClient: Map input bytes=21909 08/09/21 22:35:55 INFO mapred.JobClient: Map output bytes=36511 08/09/21 22:35:55 INFO mapred.JobClient: Combine input records=3649 08/09/21 22:35:55 INFO mapred.JobClient: Combine output records=21 08/09/21 22:35:55 INFO mapred.JobClient: Reduce input groups=7 08/09/21 22:35:55 INFO mapred.JobClient: Reduce input records=21 08/09/21 22:35:55 INFO mapred.JobClient: Reduce output records=7 |
和使用Cygwin模拟时的运行过程信息是一致的。
有了这个MapReduce Tools 插件,可真是太方便了。
转载请注明:数据分析 » 使用IBM的MapReduce Tools for Eclipse插件简化Hadoop开发和部署