當前位置:編程學習大全網 - 源碼下載 - maven怎麽添加hadoop依賴

maven怎麽添加hadoop依賴

關於Maven的使用就不再啰嗦了,網上很多,並且這麽多年變化也不大,這裏僅介紹怎麽搭建Hadoop的開發環境。

1. 首先創建工程

[plain]?view plain?copy

mvn?archetype:generate?-DgroupId=my.hadoopstudy?-DartifactId=hadoopstudy?-DarchetypeArtifactId=maven-archetype-quickstart?-DinteractiveMode=false?

2. 然後在pom.xml文件裏添加hadoop的依賴包hadoop-common, hadoop-client, hadoop-hdfs,添加後的pom.xml文件如下

[html]?view plain?copy

<project?xmlns:xsi="mon</artifactId>?

<version>2.5.1</version>?

</dependency>?

<dependency>?

<groupId>org.apache.hadoop</groupId>?

<artifactId>hadoop-hdfs</artifactId>?

<version>2.5.1</version>?

</dependency>?

<dependency>?

<groupId>org.apache.hadoop</groupId>?

<artifactId>hadoop-client</artifactId>?

<version>2.5.1</version>?

</dependency>?

<dependency>?

<groupId>junit</groupId>?

<artifactId>junit</artifactId>?

<version>3.8.1</version>?

<scope>test</scope>?

</dependency>?

</dependencies>?

</project>?

3. 測試

3.1 首先我們可以測試壹下hdfs的開發,這裏假定使用上壹篇Hadoop文章中的hadoop集群,類代碼如下

[java]?view plain?copy

package?my.hadoopstudy.dfs;?

import?org.apache.hadoop.conf.Configuration;?

import?org.apache.hadoop.fs.FSDataOutputStream;?

import?org.apache.hadoop.fs.FileStatus;?

import?org.apache.hadoop.fs.FileSystem;?

import?org.apache.hadoop.fs.Path;?

import?org.apache.hadoop.io.IOUtils;?

import?java.io.InputStream;?

import?java.net.URI;?

public?class?Test?{?

public?static?void?main(String[]?args)?throws?Exception?{?

String?uri?=?"hdfs://9.111.254.189:9000/";?

Configuration?config?=?new?Configuration();?

FileSystem?fs?=?FileSystem.get(URI.create(uri),?config);?

//?列出hdfs上/user/fkong/目錄下的所有文件和目錄?

FileStatus[]?statuses?=?fs.listStatus(new?Path("/user/fkong"));?

for?(FileStatus?status?:?statuses)?{?

System.out.println(status);?

}?

//?在hdfs的/user/fkong目錄下創建壹個文件,並寫入壹行文本?

FSDataOutputStream?os?=?fs.create(new?Path("/user/fkong/test.log"));?

os.write("Hello?World!".getBytes());?

os.flush();?

os.close();?

//?顯示在hdfs的/user/fkong下指定文件的內容?

InputStream?is?=?fs.open(new?Path("/user/fkong/test.log"));?

IOUtils.copyBytes(is,?System.out,?1024,?true);?

}?

}?

3.2 測試MapReduce作業

測試代碼比較簡單,如下:

[java]?view plain?copy

package?my.hadoopstudy.mapreduce;?

import?org.apache.hadoop.conf.Configuration;?

import?org.apache.hadoop.fs.Path;?

import?org.apache.hadoop.io.IntWritable;?

import?org.apache.hadoop.io.Text;?

import?org.apache.hadoop.mapreduce.Job;?

import?org.apache.hadoop.mapreduce.Mapper;?

import?org.apache.hadoop.mapreduce.Reducer;?

import?org.apache.hadoop.mapreduce.lib.input.FileInputFormat;?

import?org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;?

import?org.apache.hadoop.util.GenericOptionsParser;?

import?java.io.IOException;?

public?class?EventCount?{?

public?static?class?MyMapper?extends?Mapper<Object,?Text,?Text,?IntWritable>{?

private?final?static?IntWritable?one?=?new?IntWritable(1);?

private?Text?event?=?new?Text();?

public?void?map(Object?key,?Text?value,?Context?context)?throws?IOException,?InterruptedException?{?

int?idx?=?value.toString().indexOf("?");?

if?(idx?>?0)?{?

String?e?=?value.toString().substring(0,?idx);?

event.set(e);?

context.write(event,?one);?

}?

}?

}?

public?static?class?MyReducer?extends?Reducer<Text,IntWritable,Text,IntWritable>?{?

private?IntWritable?result?=?new?IntWritable();?

public?void?reduce(Text?key,?Iterable<IntWritable>?values,?Context?context)?throws?IOException,?InterruptedException?{?

int?sum?=?0;?

for?(IntWritable?val?:?values)?{?

sum?+=?val.get();?

}?

result.set(sum);?

context.write(key,?result);?

}?

}?

public?static?void?main(String[]?args)?throws?Exception?{?

Configuration?conf?=?new?Configuration();?

String[]?otherArgs?=?new?GenericOptionsParser(conf,?args).getRemainingArgs();?

if?(otherArgs.length?<?2)?{?

System.err.println("Usage:?EventCount?<in>?<out>");?

System.exit(2);?

}?

Job?job?=?Job.getInstance(conf,?"event?count");?

job.setJarByClass(EventCount.class);?

job.setMapperClass(MyMapper.class);?

job.setCombinerClass(MyReducer.class);?

job.setReducerClass(MyReducer.class);?

job.setOutputKeyClass(Text.class);?

job.setOutputValueClass(IntWritable.class);?

FileInputFormat.addInputPath(job,?new?Path(otherArgs[0]));?

FileOutputFormat.setOutputPath(job,?new?Path(otherArgs[1]));?

System.exit(job.waitForCompletion(true)0?:?1);?

}?

}?

運行“mvn package”命令產生jar包hadoopstudy-1.0-SNAPSHOT.jar,並將jar文件復制到hadoop安裝目錄下

這裏假定我們需要分析幾個日誌文件中的Event信息來統計各種Event個數,所以創建壹下目錄和文件

[plain]?view plain?copy

/tmp/input/event.log.1?

/tmp/input/event.log.2?

/tmp/input/event.log.3?

因為這裏只是要做壹個列子,所以每個文件內容可以都壹樣,假如內容如下

[plain]?view plain?copy

JOB_NEW?...?

JOB_NEW?...?

JOB_FINISH?...?

JOB_NEW?...?

JOB_FINISH?...?

然後把這些文件復制到HDFS上

[plain]?view plain?copy

$?bin/hdfs?dfs?-put?/tmp/input?/user/fkong/input?

運行mapreduce作業

[plain]?view plain?copy

$?bin/hadoop?jar?hadoopstudy-1.0-SNAPSHOT.jar?my.hadoopstudy.mapreduce.EventCount?/user/fkong/input?/user/fkong/output?

查看執行結果

[plain]?view plain?copy

$?bin/hdfs?dfs?-cat?/user/fkong/output/part-r-00000?

  • 上一篇:2019最好用的自動化測試工具Top 10,果斷收藏
  • 下一篇:好看的小說要玄幻魔幻類的,網遊的也行,多壹點
  • copyright 2024編程學習大全網