问题描述
解决方案
解决方案二:
引用楼主Yt_Sports的回复:reducebykey只能是key-value的形式,你这里是三元tuple了,你可以先处理一行之后再reducebykey
解决方案三:
JavaPairRDD<String,String>rdd1=lines.mapToPair(newPairFunction<String,String,String>(){@OverridepublicTuple2<String,String>call(Stringarg0)throwsException{Stringtemp=arg0.split("")[0];Stringtemp2=arg0.split("")[1];Stringtemp3=arg0.split("")[2];returnnewTuple2<String,String>(temp,temp2+"-"+temp3);}});JavaPairRDD<String,String>rdd2=rdd1.reduceByKey(newFunction2<String,String,String>(){@OverridepublicStringcall(Stringarg0,Stringarg1)throwsException{inta=Integer.parseInt(arg0.split("-")[0]);inta2=Integer.parseInt(arg1.split("-")[0]);Stringaa=String.valueOf(a+a2);intb=Integer.parseInt(arg0.split("-")[1]);intb2=Integer.parseInt(arg1.split("-")[1]);Stringbb=String.valueOf(b+b2);returnaa+"-"+bb;}});JavaRDD<String>rdd3=rdd2.map(newFunction<Tuple2<String,String>,String>(){@OverridepublicStringcall(Tuple2<String,String>arg0)throwsException{Stringlines=arg0._1()+""+arg0._2.split("-")[0]+""+arg0._2.split("-")[1];returnlines;}});System.out.println(rdd3.collect());输入num1020num1122name2233cmj33221输出[cmj33221,num2142,name2233]
解决方案四:
用不着字符串拼接,先用map转换一下类型就好scala>valcm=c.map(e=>(e._1,(e._2,0)))cm:org.apache.spark.rdd.RDD[(String,(Int,Int))]=MapPartitionsRDD[25]atmapat<console>:23scala>valcr=cm.reduceByKey((e1,e2)=>(e1._1+e2._1,e1._1/2+e2._1/2))cr:org.apache.spark.rdd.RDD[(String,(Int,Int))]=ShuffledRDD[26]atreduceByKeyat<console>:25scala>valcz=cr.map(e=>(e._1,e._2._1,e._2._2))cz:org.apache.spark.rdd.RDD[(String,Int,Int)]=MapPartitionsRDD[27]atmapat<console>:27scala>cz.collectres15:Array[(String,Int,Int)]=Array((b,3,1),(a,6,2),(c,1,0))scala>valc=sc.parallelize(List(("a",1),("a",2),("a",3),("b",1),("b",2),("c",1)))c:org.apache.spark.rdd.RDD[(String,Int)]=ParallelCollectionRDD[28]atparallelizeat<console>:21