實(shí)驗(yàn)3-MapReduce編程初級(jí)實(shí)踐

上傳人：t*** IP屬地：天津上傳時(shí)間：2022-04-13 格式：DOCX 頁(yè)數(shù)：7 大小：14.86KB 積分：18 舉報(bào) 版權(quán)申訴

實(shí)驗(yàn)3-MapReduce編程初級(jí)實(shí)踐_第2頁(yè)

實(shí)驗(yàn)3-MapReduce編程初級(jí)實(shí)踐_第3頁(yè)

實(shí)驗(yàn)3-MapReduce編程初級(jí)實(shí)踐_第4頁(yè)

實(shí)驗(yàn)3-MapReduce編程初級(jí)實(shí)踐_第5頁(yè)

已閱讀5頁(yè)，還剩2頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說(shuō)明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、實(shí)驗(yàn)3 MapReduce編程初級(jí)實(shí)踐1 .實(shí)驗(yàn)?zāi)康? .通過(guò)實(shí)驗(yàn)掌握基本的 MapReduce編程方法；2 .掌握用MapReduce解決一些常見(jiàn)的數(shù)據(jù)處理問(wèn)題，包括數(shù)據(jù)去重、數(shù)據(jù)排序和數(shù)據(jù)挖掘等。2 .實(shí)驗(yàn)平臺(tái)已經(jīng)配置完成的 Hadoop偽分布式環(huán)境。3 .實(shí)驗(yàn)內(nèi)容和要求1 .編程實(shí)現(xiàn)文件合并和去重操作對(duì)于兩個(gè)輸入文件，即文件A和文件B,請(qǐng)編寫(xiě)MapReduce程序，對(duì)兩個(gè)文件進(jìn)行合并,并剔除其中重復(fù)的內(nèi)容，得到一個(gè)新的輸出文件供參考。a下面是輸入文件和輸出文件的一個(gè)樣例實(shí)驗(yàn)最終結(jié)果（合并的文件）：國(guó) hdri：locdlhiisL90O0/user/wD hdk./liLdihuiL.

2、9COO/u5fcjJ015D1D1I y 20)501011 V.2O15O1DJ y| 201301022013010320150104201501042015010520150105 上501%代碼如下：package com.Merge;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;i

3、mport org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.;import org.apache.hadoop.mapreduce.lib.output.;public class Merge public static class Map extends Mapper<Object, Text, Text, Text> private static Text text = new Te

4、xt();public void map(Object key, Text value, Context context) throws IOException, InterruptedException text = value;context.write(text, new Text(""); public static class Reduce extends Reducer<Text, Text, Text, Text> public void reduce(Text key, Iterable<Text> values, Context c

5、ontext) throws IOException, InterruptedException context.write(key, new Text("");public static void main(String口 args) throws Exception Configuration conf = new Configuration。； conf.set("fs.defaultFS", "hdfs:localhost:9000");String otherArgs = new String "input&quo

6、t;, "output" ;if (otherArgs.length != 2) System.err.println("Usage: Merge and duplicate removal <in> <out>");System.exit(2);Job job = Job.getInstance(conf, "Merge and duplicate removal");job.setJarByClass(Merge.class);job.setMapperClass(Map.class); job.setRe

7、ducerClass(Reduce.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);(job, new Path(otherArgs0);(job, new Path(otherArgs1);System.exit(job.waitForCompletion(true) ? 0 : 1);2 .編寫(xiě)程序?qū)崿F(xiàn)對(duì)輸入文件的排序現(xiàn)在有多個(gè)輸入文件，每個(gè)文件中的每行內(nèi)容均為一個(gè)整數(shù)。要求讀取所有文件中的整數(shù)，進(jìn)行升序排序后，輸出到一個(gè)新的文件中，輸出的數(shù)據(jù)格式為每行兩個(gè)整數(shù)，第一個(gè)數(shù)字為第二個(gè)整

8、數(shù)的排序位次，第二個(gè)整數(shù)為原待排列的整數(shù)。下面是輸入文件和輸出文件的一個(gè)樣例供參考。實(shí)驗(yàn)結(jié)果截圖:I -! hdfs:/localhosL90O Ju MergeSortjava U hdfWA口匚成host:90Q S32 43 54 125 166 257 333 37? 3910 4011 45代碼如下：package com.MergeSort;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache

9、.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.;import org.apache.hadoop.mapreduce.lib.output.;public class MergeSort publ

10、ic static class Map extendsMapper<Object, Text, IntWritable, IntWritable> private static IntWritable data = new IntWritable();public void map(Object key, Text value, Context context)throws IOException, InterruptedException String line = value.toString();data.set(Integer.parseInt(line);context.

11、write(data, new IntWritable。)；public static class Reduce extendsReducer<IntWritable, IntWritable, IntWritable, IntWritable private static IntWritable linenum = new IntWritable；public void reduce(IntWritable key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedEx

12、ception for (IntWritable val : values) context.write(linenum, key);linenum = new IntWritable(linenum.get() + 1); public static void main(String口 args) throws Exception Configuration conf = new Configuration。；conf.set("fs.defaultFS", "hdfs:localhost:9000");String口 otherArgs = new

13、String "input2", "output2" ; /*直接設(shè)置輸入?yún)?shù)*/if (otherArgs.length != 2) System.err.println("Usage: mergesort <in> <out>");System.exit(2);Job job = Job.getInstance(conf, "mergesort");job.setJarByClass(MergeSort.class);job.setMapperClass(Map.class);job.s

14、etReducerClass(Reduce.class);job.setOutputKeyClass(IntWritable.class);job.setOutputValueClass(IntWritable.class);(job, new Path(otherArgs0);(job, new Path(otherArgs1);System.exit(job.waitForCompletion(true) ? 0 : 1);3.對(duì)給定的表格進(jìn)行信息挖掘下面給出一個(gè)child-parent 的表格，要求挖掘其中的父子輩關(guān)系，給出祖孫輩關(guān)系的表格。實(shí)驗(yàn)最后結(jié)果截圖如下：FfD hdfs:/l

15、ocalhost:900 0 STjoin.java , hdfs:/hcalhost:900 E3 *grand_child grandjsarentMark JesseMark AlicePhilip JessePhilip AliceJone JesseJone AliceSteven JesseSteven AliceSteven FrankSteven Mary Jone Frank Jone Mary 代碼如下：package com.join;import java.io.IOException;import java.util.*;import org.apache.hadoo

16、p.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.;import org.apache.hadoop.mapreduce.lib.outp

17、ut.;public class STjoin public static int time = 0;public static class Map extends Mapper<Object, Text, Text, Text> public void map(Object key, Text value, Context context) throws IOException, InterruptedException String child_name = new String();String parent_name = new String();String relati

18、on_type = new String();String line = value.toString();int i = 0;while (line.charAt(i) != ' ') i+;String values = line.substring(0, i), line.substring(i + 1) ;if (pareTo("child") != 0) child_name = values0;parent_name = values1;relation_type = "1"context.write(n

19、ew Text(values1), new Text(relation_type + "+"+ child_name + "+" + parent_name);relation_type = "2"context.write(new Text(values0), new Text(relation_type + "+"+ child_name + "+" + parent_name);public static class Reduce extends Reducer<Text, Text

20、, Text, Text> public void reduce(Text key, Iterable<Text> values, Context context)throws IOException, InterruptedException if (time = 0) context.write(new Text("grand_child"), new Text("grand_parent");time+;int grand_child_num = 0;String grand_child = new String10;int gr

21、and_parent_num = 0;String grand_parent = new String10;Iterator ite = values.iterator();while (ite.hasNext() String record = ite.next().toString();int len = record.length();int i = 2;if (len = 0)continue;char relation_type = record.charAt(0);String child_name = new String();String parent_name = new S

22、tring。； while (record.charAt(i) != '+') child_name = child_name + record.charAt(i);i+；i = i + 1;while (i < len) parent_name = parent_name + record.charAt(i);i+；if (relation_type = '1') grand_childgrand_child_num = child_name; grand_child_num+; else grand_parentgrand_parent_num = p

23、arent_name; grand_parent_num+;一一if (grand_parent_num != 0 && grand_child_num != 0) for (int m = 0; m < grand_child_num; m+) for (int n = 0; n < grand_parent_num; n+) context.write(new Text(grand_childm), new Text( grand_parentn);一public static void main(String args) throws Exception Configuration conf = new Configuration。；conf.set("fs.defaultFS", "hdfs:/localhost:9000");String otherArgs = new String "input3", "output3" ;if (otherArgs.length != 2) System.err.println(&q

人人文庫(kù)> 全部分類(lèi)> 教育資料 > 備課教案

溫馨提示

1. 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

實(shí)驗(yàn)3-MapReduce編程初級(jí)實(shí)踐

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

實(shí)驗(yàn)3-MapReduce編程初級(jí)實(shí)踐

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔