如何配置Spring及驗證mapreduce

傳說中的 Spring 終於整合了 Hadoop, 推出了 Spring Hadoop.

當妳想要開始體驗 Spring Hadoop 的時候, 妳會遇到各式各樣奇怪的問題, 目前也有人開始陸續回報了.

如果妳只是想要簡單的試用壹下, 又不想要自己解決這些疑難雜癥, 建議大家可以參考下面的步驟來快速體驗壹下 Spring Hadoop 的威力.

環境要求: Hadoop 0.20.2以上

安裝之後, 那就讓我們來開始吧...

Step1. 下載 Spring Hadoop, 這邊是使用 git 去下載, 如果妳對 git 不熟悉的話, 也可以直接從官網下載再解壓縮

參考：軟件版本控制-在Windows中使用Git視頻介紹

這邊的例子裏面是用我的 home 目錄為例, 大家記得要改成妳自己的目錄名稱

/home/evanshsu mkdir springhadoop　

/home/evanshsu cd springhadoop

/home/evanshsu/springhadoop git init

/home/evanshsu/springhadoop git pull "git://github.com/SpringSource/spring-hadoop.git"

Step2. build spring-hadoop.jar

build完之後, 我們要把所有的 jar 檔都放在 /home/evanshsu/springhadoop/lib 裏面, 以便之後把所有的jar 檔包在同壹包裏面

/home/evanshsu/springhadoop ./gradlew jar

/home/evanshsu/springhadoop mkdir lib

/home/evanshsu/springhadoop cp build/libs/spring-data-hadoop-1.0.0.BUILD-SNAPSHOT.jar lib/

Step3. get spring-framework.

因為 spring hadoop 是倚賴於 spring-framework 的, 所以我們也要把 spring-framework 的 jar 檔放在 lib 裏面

/home/evanshsu/spring wget "/dist.springframework.org/release/SPR/spring-framework-3.1.1.RELEASE.zip"

/home/evanshsu/spring unzip spring-framework-3.1.1.RELEASE.zip

/home/evanshsu/spring cp spring-framework-3.1.1.RELEASE/dist/*.jar /home/evanshsu/springhadoop/lib/

Step4. 修改 build file 讓我們可以把所有的 jar 檔, 封裝到同壹個 jar 檔裏面

/home/evanshsu/spring/samples/wordcount vim build.gradle

description = 'Spring Hadoop Samples - WordCount'

apply plugin: 'base'

apply plugin: 'java'

apply plugin: 'idea'

apply plugin: 'eclipse'

repositories {

flatDir(dirs: '/home/evanshsu/springhadoop/lib/')

// Public Spring artefacts

maven { url "pile fileTree('/home/evanshsu/springhadoop/lib/')

compile "org.apache.hadoop:hadoop-examples:$hadoopVersion"

// see HADOOP-7461

runtime "org.codehaus.jackson:jackson-mapper-asl:$jacksonVersion"

testCompile "junit:junit:$junitVersion"

testCompile "org.springframework:spring-test:$springVersion"

}

jar {

from configurations.compile.collect { it.isDirectory() ? it : zipTree(it).matching{

exclude 'META-INF/spring.schemas'

exclude 'META-INF/spring.handlers'

} }

}

Step5. 這邊有壹個特殊的 hadoop.properties 主要是放置 hadoop 相關的設定數據.

基本上我們要把 wordcount.input.path wordcount.output.path 改成之後執行 wordcount 要使用的目錄,　而且wordcount.input.path 裏面記得要放幾個文本文件

另外, 還要把 hd.fs 改成妳 hdfs 的設定

如果妳是用國網中心 Hadoop 的話, 要把 hd.fs 改成 hd.fs=hdfs://gm2.nchc.org.tw:8020

/home/evanshsu/spring/samples/wordcount vim src/main/resources/hadoop.properties

wordcount.input.path=/user/evanshsu/input.txt

wordcount.output.path=/user/evanshsu/output

hive.host=localhost

hive.port=12345

hive.url=jdbc:hive://${hive.host}:${hive.port}

hd.fs=hdfs://localhost:9000

mapred.job.tracker=localhost:9001

path.cat=bin${file.separator}stream-bin${file.separator}cat

path.wc=bin${file.separator}stream-bin${file.separator}wc

input.directory=logs

log.input=/logs/input/

log.output=/logs/output/

distcp.src=${hd.fs}/distcp/source.txt

distcp.dst=${hd.fs}/distcp/dst

Step6. 這是最重要的壹個配置文件, 有用過 Spring 的人都知道這個配置文件是Spring 的靈魂

/home/evanshsu/spring/samples/wordcount vim src/main/resources/META-INF/spring/context.xml

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:context="http://www.springframework.org/schema/context"

xmlns:hdp="http://www.springframework.org/schema/hadoop"

xmlns:p="http://www.springframework.org/schema/p"

xsi:schemaLocation="http://www.springframework.org/schema/beanshttp://www.springframework.org/schema/beans/spring-beans.xsd

http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd

http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">

<context:property-placeholder location="hadoop.properties"/>

<hdp:configuration>

fs.default.name=${hd.fs}

</hdp:configuration>

<hdp:job id="wordcount-job" validate-paths="false"

input-path="${wordcount.input.path}" output-path="${wordcount.output.path}"

mapper="org.springframework.data.hadoop.samples.wordcount.WordCountMapper"

reducer="org.springframework.data.hadoop.samples.wordcount.WordCountReducer"

jar-by-class="org.springframework.data.hadoop.samples.wordcount.WordCountMapper" />

</beans>

Step7. 加上自己的 mapper, reducer

/home/evanshsu/spring/samples/wordcount vim src/main/java/org/springframework/data/hadoop/samples/wordcount/WordCountMapper.java

package org.springframework.data.hadoop.samples.wordcount;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper<Object, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(Object key, Text value, Context context)

throws IOException, InterruptedException {

StringTokenizer itr = new StringTokenizer(value.toString());

while (itr.hasMoreTokens()) {

word.set(itr.nextToken());

context.write(word, one);

}

/home/evanshsu/spring/samples/wordcount vim src/main/java/org/springframework/data/hadoop/samples/wordcount/WordCountReducer.java

package org.springframework.data.hadoop.samples.wordcount;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends

Reducer<Text, IntWritable, Text, IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values, Context context)

throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

}

result.set(sum);

context.write(key, result);

}

Step8. 加上 spring.schemas, spring.handlers

/home/evanshsu/spring/samples/wordcount vim src/main/resources/META-INF/spring.schemas

http\://www.springframework.org/schema/context/spring-context.xsd=org/springframework/context/config/spring-context-3.1.xsd

http\://www.springframework.org/schema/hadoop/spring-hadoop.xsd=/org/springframework/data/hadoop/config/spring-hadoop-1.0.xsd

/home/evanshsu/spring/samples/wordcount vim src/main/resources/META-INF/spring.handlers

http\://www.springframework.org/schema/p=org.springframework.beans.factory.xml.SimplePropertyNamespaceHandler

http\://www.springframework.org/schema/context=org.springframework.context.config.ContextNamespaceHandler

http\://www.springframework.org/schema/hadoop=org.springframework.data.hadoop.config.HadoopNamespaceHandler

Step9. 終於到最後壹步啰, 這壹步我們要把所有的 jar 檔封裝在壹起, 並且丟到hadoop 上面去跑

/home/evanshsu/spring/samples/wordcount ../../gradlew jar

/home/evanshsu/spring/samples/wordcount hadoop jar build/libs/wordcount-1.0.0.M1.jarorg.springframework.data.hadoop.samples.wordcount.Main

上一篇:Powerdvd源代碼輸出dts

下一篇:android 獲取局域網IP與MAC 地址毫秒級（詳解）