问题描述
我输入的是一个csv文件,每行内容如下HX332780,14/7/5,OTHEROFFENSE,PROBATIONVIOLATION,PARKINGLOT/GARAGE(NON.RESID.),Y,N,1113HX332854,14/7/5,OTHEROFFENSE,HARASSMENTBYTELEPHONE,APARTMENT,N,N,1533HX332743,14/7/5,CRIMINALDAMAGE,TOVEHICLE,STREET,N,N,1021HX332735,14/7/5,THEFT,$500ANDUNDER,RESTAURANT,N,N,1014...........以下是简单处理的代码objectSparkPi{defmain(args:Array[String]){valconf=newSparkConf().setAppName("SparkPi").setMaster("spark://Master:7077").setJars(List("/home/hadoop/Downloads/JetBrains.IntelliJ.xdowns/idea-IU-139.1117.1/spark-examples-1.5.2-hadoop2.6.0.jar"))valsc=newSparkContext(conf)valrawData=sc.textFile("/home/hadoop/123.csv")valsecondData=rawData.map(_.split(",").takeRight(4).head)valthirdData=secondData.map(n=>(n,1)).reduceByKey(_+_).collect()sc.stop()}}在集群执行后出现以下错误15/12/0922:11:09WARNTaskSetManager:Losttask1.0instage0.0(TID1,219.216.65.129):java.lang.ClassCastException:cannotassigninstanceoforg.apache.spark.examples.SparkPi$$anonfun$2tofieldorg.apache.spark.rdd.RDD$$anonfun$flatMap$1$$anonfun$apply$4.cleanF$2oftypescala.Function1ininstanceoforg.apache.spark.rdd.RDD$$anonfun$flatMap$1$$anonfun$apply$4.........请问大神们是哪里出错了啊?去掉collect就没报错了,我只想统计数据每行中倒数第四列的不同单词出现的频率....