關於Maven的使用就不再啰嗦了,網上很多,並且這麽多年變化也不大,這裏僅介紹怎麽搭建Hadoop的開發環境。
1. 首先創建工程
[plain]?view plain?copy
mvn?archetype:generate?-DgroupId=my.hadoopstudy?-DartifactId=hadoopstudy?-DarchetypeArtifactId=maven-archetype-quickstart?-DinteractiveMode=false?
2. 然後在pom.xml文件裏添加hadoop的依賴包hadoop-common, hadoop-client, hadoop-hdfs,添加後的pom.xml文件如下
[html]?view plain?copy
<project?xmlns:xsi="mon</artifactId>?
<version>2.5.1</version>?
</dependency>?
<dependency>?
<groupId>org.apache.hadoop</groupId>?
<artifactId>hadoop-hdfs</artifactId>?
<version>2.5.1</version>?
</dependency>?
<dependency>?
<groupId>org.apache.hadoop</groupId>?
<artifactId>hadoop-client</artifactId>?
<version>2.5.1</version>?
</dependency>?
<dependency>?
<groupId>junit</groupId>?
<artifactId>junit</artifactId>?
<version>3.8.1</version>?
<scope>test</scope>?
</dependency>?
</dependencies>?
</project>?
3. 測試
3.1 首先我們可以測試壹下hdfs的開發,這裏假定使用上壹篇Hadoop文章中的hadoop集群,類代碼如下
[java]?view plain?copy
package?my.hadoopstudy.dfs;?
import?org.apache.hadoop.conf.Configuration;?
import?org.apache.hadoop.fs.FSDataOutputStream;?
import?org.apache.hadoop.fs.FileStatus;?
import?org.apache.hadoop.fs.FileSystem;?
import?org.apache.hadoop.fs.Path;?
import?org.apache.hadoop.io.IOUtils;?
import?java.io.InputStream;?
import?java.net.URI;?
public?class?Test?{?
public?static?void?main(String[]?args)?throws?Exception?{?
String?uri?=?"hdfs://9.111.254.189:9000/";?
Configuration?config?=?new?Configuration();?
FileSystem?fs?=?FileSystem.get(URI.create(uri),?config);?
//?列出hdfs上/user/fkong/目錄下的所有文件和目錄?
FileStatus[]?statuses?=?fs.listStatus(new?Path("/user/fkong"));?
for?(FileStatus?status?:?statuses)?{?
System.out.println(status);?
}?
//?在hdfs的/user/fkong目錄下創建壹個文件,並寫入壹行文本?
FSDataOutputStream?os?=?fs.create(new?Path("/user/fkong/test.log"));?
os.write("Hello?World!".getBytes());?
os.flush();?
os.close();?
//?顯示在hdfs的/user/fkong下指定文件的內容?
InputStream?is?=?fs.open(new?Path("/user/fkong/test.log"));?
IOUtils.copyBytes(is,?System.out,?1024,?true);?
}?
}?
3.2 測試MapReduce作業
測試代碼比較簡單,如下:
[java]?view plain?copy
package?my.hadoopstudy.mapreduce;?
import?org.apache.hadoop.conf.Configuration;?
import?org.apache.hadoop.fs.Path;?
import?org.apache.hadoop.io.IntWritable;?
import?org.apache.hadoop.io.Text;?
import?org.apache.hadoop.mapreduce.Job;?
import?org.apache.hadoop.mapreduce.Mapper;?
import?org.apache.hadoop.mapreduce.Reducer;?
import?org.apache.hadoop.mapreduce.lib.input.FileInputFormat;?
import?org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;?
import?org.apache.hadoop.util.GenericOptionsParser;?
import?java.io.IOException;?
public?class?EventCount?{?
public?static?class?MyMapper?extends?Mapper<Object,?Text,?Text,?IntWritable>{?
private?final?static?IntWritable?one?=?new?IntWritable(1);?
private?Text?event?=?new?Text();?
public?void?map(Object?key,?Text?value,?Context?context)?throws?IOException,?InterruptedException?{?
int?idx?=?value.toString().indexOf("?");?
if?(idx?>?0)?{?
String?e?=?value.toString().substring(0,?idx);?
event.set(e);?
context.write(event,?one);?
}?
}?
}?
public?static?class?MyReducer?extends?Reducer<Text,IntWritable,Text,IntWritable>?{?
private?IntWritable?result?=?new?IntWritable();?
public?void?reduce(Text?key,?Iterable<IntWritable>?values,?Context?context)?throws?IOException,?InterruptedException?{?
int?sum?=?0;?
for?(IntWritable?val?:?values)?{?
sum?+=?val.get();?
}?
result.set(sum);?
context.write(key,?result);?
}?
}?
public?static?void?main(String[]?args)?throws?Exception?{?
Configuration?conf?=?new?Configuration();?
String[]?otherArgs?=?new?GenericOptionsParser(conf,?args).getRemainingArgs();?
if?(otherArgs.length?<?2)?{?
System.err.println("Usage:?EventCount?<in>?<out>");?
System.exit(2);?
}?
Job?job?=?Job.getInstance(conf,?"event?count");?
job.setJarByClass(EventCount.class);?
job.setMapperClass(MyMapper.class);?
job.setCombinerClass(MyReducer.class);?
job.setReducerClass(MyReducer.class);?
job.setOutputKeyClass(Text.class);?
job.setOutputValueClass(IntWritable.class);?
FileInputFormat.addInputPath(job,?new?Path(otherArgs[0]));?
FileOutputFormat.setOutputPath(job,?new?Path(otherArgs[1]));?
System.exit(job.waitForCompletion(true)0?:?1);?
}?
}?
運行“mvn package”命令產生jar包hadoopstudy-1.0-SNAPSHOT.jar,並將jar文件復制到hadoop安裝目錄下
這裏假定我們需要分析幾個日誌文件中的Event信息來統計各種Event個數,所以創建壹下目錄和文件
[plain]?view plain?copy
/tmp/input/event.log.1?
/tmp/input/event.log.2?
/tmp/input/event.log.3?
因為這裏只是要做壹個列子,所以每個文件內容可以都壹樣,假如內容如下
[plain]?view plain?copy
JOB_NEW?...?
JOB_NEW?...?
JOB_FINISH?...?
JOB_NEW?...?
JOB_FINISH?...?
然後把這些文件復制到HDFS上
[plain]?view plain?copy
$?bin/hdfs?dfs?-put?/tmp/input?/user/fkong/input?
運行mapreduce作業
[plain]?view plain?copy
$?bin/hadoop?jar?hadoopstudy-1.0-SNAPSHOT.jar?my.hadoopstudy.mapreduce.EventCount?/user/fkong/input?/user/fkong/output?
查看執行結果
[plain]?view plain?copy
$?bin/hdfs?dfs?-cat?/user/fkong/output/part-r-00000?