hadoop中使用mapreduce对文本内容进行分区存储

问题描述

使用partion对一个文本中不同长度的字符串分区存储输入文本如下：Kaka128hua026chao1tao122mao02922想要将不同长度的字符串分在三个文件中存储，代码如下：publicclassTestPartionar{//MapMethodpublicstaticclassMapextendsMapper<LongWritable,Text,Text,Text>{//realizemapmethodprotectedvoidmap(LongWritablekey,Textvalue,Contextcontext)throwsIOException,InterruptedException{//readdifferentlengthstring,typethemindifferenttaginttoken_length=0;StringTokenizerreadline=newStringTokenizer(value.toString());token_length=readline.countTokens();if(token_length==2)context.write(newText("Short"),value);elseif(token_length==3)context.write(newText("Right"),value);elseif(token_length==4)context.write(newText("Long"),value);}}//PartionarMethodpublicstaticclassgetPartionarextendsPartitioner<Text,Text>{publicintgetPartition(Textkey,Textvalue,intpartionNum){return((key.hashCode()&Integer.MAX_VALUE)%partionNum);}}//ReducerMethodpublicstaticclassReduceextendsReducer<Text,Text,Text,Text>{//realizereducemethodprotectedvoidreduce(Textkey,Iterable<Text>value,Contextcontext)throwsIOException,InterruptedException{Iterator<Text>itr=value.iterator();while(itr.hasNext()){context.write(newText(""),itr.next());}}}publicstaticvoidmain(String[]args)throwsIOException,ClassNotFoundException,InterruptedException{Configurationconf=newConfiguration();conf.set("mapred.job.tracker","192.168.108.101:9001");conf.set("fs.default.name","hdfs://192.168.108.101:9000");conf.set("mapred.jar","TestPartionar.jar");Jobjob=newJob(conf,"TestPartionar");String[]ioArgs=newString[]{"test_in","test_out"};String[]otherArgs=newGenericOptionsParser(conf,ioArgs).getRemainingArgs();if(otherArgs.length!=2){System.err.println("Usage:TestPartionar<in><out>");}job.setJarByClass(TestPartionar.class);job.setMapperClass(Map.class);job.setReducerClass(Reduce.class);job.setPartitionerClass(getPartionar.class);job.setNumReduceTasks(3);job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);FileInputFormat.addInputPath(job,newPath(otherArgs[0]));FileOutputFormat.setOutputPath(job,newPath(otherArgs[1]));System.exit(job.waitForCompletion(true)?0:1);}}

但是输出的时候，长度为2和长度为3的字符串混在了一起，请问我的代码哪里是错误的？

解决方案

时间： 2024-09-22 08:38:01

hadoop中使用mapreduce对文本内容进行分区存储

问题描述

解决方案

hadoop中使用mapreduce对文本内容进行分区存储的相关文章

在hadoop下运用Mapreduce构建文本索引

请问java图形界面编程中怎样改变消息提示框中确定按钮的文本内容啊？

一脸懵逼学习Hadoop中的MapReduce程序中自定义分组的实现

linux中批量查找替换文本内容例子

panel-C# winform 如何实现将Label控件的文本内容拖动到任意的Panel控件中。

一天一个shell命令 linux文本内容操作系列-awk命令详解_linux shell

《Hadoop实战第2版》——1.5节Hadoop计算模型—MapReduce

从Hadoop框架与MapReduce模式中谈海量数据处理（含淘宝技术架构）

文本编辑器语义 c++-对特定图文编辑器中的文本内容进行语义分析