如何利用MapReduce將數據寫(xiě)入HDFS并創(chuàng )建文件?
摘要:MapR??educe是(shi)何利一個(gè)編程模型,用于處理和生成大數據集。將數據寫(xiě)在Hadoop分布式文件系統(HDFS)中創(chuàng )建文件并寫(xiě)入內容是創(chuàng )建MapReduce作業(yè)的一部分。通過(guò)使用Hadoop的文件文件系統API,可以在HDFS(?Д?)上創(chuàng )建文件并向其中寫(xiě)入數據。何利(li)
MapReduce是將數??據???寫(xiě)一種編程模型,用于處理和生成大數據集,創(chuàng )建HDFS(Hadoop Distributed File System)是文件一個(gè)分布式文件系統,用于存儲大數據,何利下面是將數據寫(xiě)一個(gè)使用MapReduce將數(//ω//)據寫(xiě)入HDFS的示例:
(圖片來(lái)源網(wǎng)絡(luò ),侵刪)1、創(chuàng )建我們需要創(chuàng )建一個(gè)Java項目,文件并添加Hadoop相關(guān)的何利依賴(lài)庫,在項目的將數據寫(xiě)pom.xml文件中添加以下依賴(lài):
<depend??encies> <depende??ncy> <groupId>org.apache.hadoop</groupId> <artifactId>??;hadoopclient</artifactId( ?ヮ?)> <ver(′?_?`)sion&g(T_T)t;2.7.3</version> </dependency></dependenci??es>
2、編寫(xiě)一個(gè)MapReduce程序,創(chuàng )建實(shí)現將數據寫(xiě)入HDFS的功能,??以下是一個(gè)簡(jiǎn)單的示例:
import org.apache??.hadoop.conf.Configuration;im(′?ω?`)port org.apache.hadoop.fs.Pa???th;import org.apache.hadoop.io.IntWritable;import org.apache.??hadoop.io┐(′?`)┌.Text;import org.apache.h??adoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOu(′▽?zhuān)?tputFormat;import java.io.IOException;public class WriteToHDFS { public static class Token??(′_`)izerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedEx(′?`)ception { String[ヾ(′▽?zhuān)??] tokens = value.toString().split("\s+&q??uo??t;); for (String token : tokens) { word.set(token); context.wr??ite(word, one); } } } public static clヽ(′▽?zhuān)?ノasヾ(′▽?zhuān)??s IntSumReduceヽ(′ー`)ノr extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWrita(╬?益?)ble result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOE( ?ヮ?)xception, Int??erruptedException { int sum = 0; for (IntWritable val : values) { su??m += val.get(); } result.set(sum); context??.wrヾ(′ω`)?ite(key, result); } } public static void main(┐(′д`)┌String[] args) throws Exceptio(′▽?zhuān)?n { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "write to HDFS"); job.setJarByClass(WriteToHDFS.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumR??educer.clas(°o°)s); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.ad??dInputPath((//ω//)jo(?⊿?)b, new┐(′ー`)┌ Path(args[0])); FileOutputFormat.setO??utputPath(job, new Path(args[1])); System.exit(′▽?zhuān)?(job.waitForCompletion(true) ? 0 : 1); }}3、編譯并打包Java項目,生成jar文件。
4、使用??Hadoop命令行工具運行MapReduce任務(wù),將數據寫(xiě)??入HDFS,假設我們已經(jīng)將輸入數據上傳到HDFS的/input目錄下,輸出目錄為/output,jar文件名為writetohdfs?.jar,則運行以下命令:
hadoop jar writetohdfs.jar WriteToHDFS /input /output
5、等待任務(wù)完成,然后可以使用(′▽?zhuān)?hadoop fs ls /output命令查看輸出目錄中的內容。
