實驗3-MapReduce編程初級實踐_第1頁
實驗3-MapReduce編程初級實踐_第2頁
實驗3-MapReduce編程初級實踐_第3頁
實驗3-MapReduce編程初級實踐_第4頁
實驗3-MapReduce編程初級實踐_第5頁
已閱讀5頁,還剩2頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

1、實驗3 MapReduce編程初級實踐1 .實驗目的1 .通過實驗掌握基本的 MapReduce編程方法;2 .掌握用MapReduce解決一些常見的數(shù)據(jù)處理問題,包括數(shù)據(jù)去重、數(shù)據(jù)排序和數(shù)據(jù)挖掘等。2 .實驗平臺已經(jīng)配置完成的 Hadoop偽分布式環(huán)境。3 .實驗內(nèi)容和要求1 .編程實現(xiàn)文件合并和去重操作對于兩個輸入文件,即文件A和文件B,請編寫MapReduce程序,對兩個文件進行合并,并剔除其中重復的內(nèi)容,得到一個新的輸出文件 供參考。a下面是輸入文件和輸出文件的一個樣例實驗最終結(jié)果(合并的文件):國 hdri:locdlhiisL90O0/user/wD hdk./liLdihuiL.

2、9COO/u5fcjJ015D1D1I y 20)501011 V.2O15O1DJ y| 201301022013010320150104201501042015010520150105 上501%代碼如下:package com.Merge;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;i

3、mport org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.;import org.apache.hadoop.mapreduce.lib.output.;public class Merge public static class Map extends Mapper<Object, Text, Text, Text> private static Text text = new Te

4、xt();public void map(Object key, Text value, Context context) throws IOException, InterruptedException text = value;context.write(text, new Text(""); public static class Reduce extends Reducer<Text, Text, Text, Text> public void reduce(Text key, Iterable<Text> values, Context c

5、ontext) throws IOException, InterruptedException context.write(key, new Text("");public static void main(String口 args) throws Exception Configuration conf = new Configuration。; conf.set("fs.defaultFS", "hdfs:localhost:9000");String otherArgs = new String "input&quo

6、t;, "output" ;if (otherArgs.length != 2) System.err.println("Usage: Merge and duplicate removal <in> <out>");System.exit(2);Job job = Job.getInstance(conf, "Merge and duplicate removal");job.setJarByClass(Merge.class);job.setMapperClass(Map.class); job.setRe

7、ducerClass(Reduce.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);(job, new Path(otherArgs0);(job, new Path(otherArgs1);System.exit(job.waitForCompletion(true) ? 0 : 1);2 .編寫程序?qū)崿F(xiàn)對輸入文件的排序現(xiàn)在有多個輸入文件, 每個文件中的每行內(nèi)容均為一個整數(shù)。要求讀取所有文件中的整數(shù),進行升序排序后,輸出到一個新的文件中,輸出的數(shù)據(jù)格式為每行兩個整數(shù),第一個數(shù)字為第二個整

8、數(shù)的排序位次,第二個整數(shù)為原待排列的整數(shù)。下面是輸入文件和輸出文件的 一個樣例供參考。實驗結(jié)果截圖:I -! hdfs:/localhosL90O Ju MergeSortjava U hdfWA口匚成host:90Q S32 43 54 125 166 257 333 37? 3910 4011 45代碼如下:package com.MergeSort;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache

9、.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.;import org.apache.hadoop.mapreduce.lib.output.;public class MergeSort publ

10、ic static class Map extendsMapper<Object, Text, IntWritable, IntWritable> private static IntWritable data = new IntWritable();public void map(Object key, Text value, Context context)throws IOException, InterruptedException String line = value.toString();data.set(Integer.parseInt(line);context.

11、write(data, new IntWritable。);public static class Reduce extendsReducer<IntWritable, IntWritable, IntWritable, IntWritable private static IntWritable linenum = new IntWritable;public void reduce(IntWritable key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedEx

12、ception for (IntWritable val : values) context.write(linenum, key);linenum = new IntWritable(linenum.get() + 1); public static void main(String口 args) throws Exception Configuration conf = new Configuration。;conf.set("fs.defaultFS", "hdfs:localhost:9000");String口 otherArgs = new

13、String "input2", "output2" ; /*直接設置輸入?yún)?shù)*/if (otherArgs.length != 2) System.err.println("Usage: mergesort <in> <out>");System.exit(2);Job job = Job.getInstance(conf, "mergesort");job.setJarByClass(MergeSort.class);job.setMapperClass(Map.class);job.s

14、etReducerClass(Reduce.class);job.setOutputKeyClass(IntWritable.class);job.setOutputValueClass(IntWritable.class);(job, new Path(otherArgs0);(job, new Path(otherArgs1);System.exit(job.waitForCompletion(true) ? 0 : 1);3.對給定的表格進行信息挖掘下面給出一個child-parent 的表格,要求挖掘其中的父子輩關(guān)系,給出祖孫輩關(guān)系的 表格。實驗最后結(jié)果截圖如下:FfD hdfs:/l

15、ocalhost:900 0 STjoin.java , hdfs:/hcalhost:900 E3 *grand_child grandjsarentMark JesseMark AlicePhilip JessePhilip AliceJone JesseJone AliceSteven JesseSteven AliceSteven FrankSteven Mary Jone Frank Jone Mary 代碼如下:package com.join;import java.io.IOException;import java.util.*;import org.apache.hadoo

16、p.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.;import org.apache.hadoop.mapreduce.lib.outp

17、ut.;public class STjoin public static int time = 0;public static class Map extends Mapper<Object, Text, Text, Text> public void map(Object key, Text value, Context context) throws IOException, InterruptedException String child_name = new String();String parent_name = new String();String relati

18、on_type = new String();String line = value.toString();int i = 0;while (line.charAt(i) != ' ') i+;String values = line.substring(0, i), line.substring(i + 1) ;if (pareTo("child") != 0) child_name = values0;parent_name = values1;relation_type = "1"context.write(n

19、ew Text(values1), new Text(relation_type + "+"+ child_name + "+" + parent_name);relation_type = "2"context.write(new Text(values0), new Text(relation_type + "+"+ child_name + "+" + parent_name);public static class Reduce extends Reducer<Text, Text

20、, Text, Text> public void reduce(Text key, Iterable<Text> values, Context context)throws IOException, InterruptedException if (time = 0) context.write(new Text("grand_child"), new Text("grand_parent");time+;int grand_child_num = 0;String grand_child = new String10;int gr

21、and_parent_num = 0;String grand_parent = new String10;Iterator ite = values.iterator();while (ite.hasNext() String record = ite.next().toString();int len = record.length();int i = 2;if (len = 0)continue;char relation_type = record.charAt(0);String child_name = new String();String parent_name = new S

22、tring。; while (record.charAt(i) != '+') child_name = child_name + record.charAt(i);i+;i = i + 1;while (i < len) parent_name = parent_name + record.charAt(i);i+;if (relation_type = '1') grand_childgrand_child_num = child_name; grand_child_num+; else grand_parentgrand_parent_num = p

23、arent_name; grand_parent_num+;一 一if (grand_parent_num != 0 && grand_child_num != 0) for (int m = 0; m < grand_child_num; m+) for (int n = 0; n < grand_parent_num; n+) context.write(new Text(grand_childm), new Text( grand_parentn);一public static void main(String args) throws Exception Configuration conf = new Configuration。;conf.set("fs.defaultFS", "hdfs:/localhost:9000");String otherArgs = new String "input3", "output3" ;if (otherArgs.length != 2) System.err.println(&q

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論