问题描述
frompysparkimportSparkContext,SparkConffrompyspark.sqlimportSQLContextsc=SparkContext("local","SimpleApp")sqlContext=SQLContext(sc)url="jdbc:mysql://localhost:3306/stock_data?user=root&password=test"df=sqlContext.read.format("jdbc").option("url",url).option("dbtable","stock_detail_collect").load()df.printSchema()counts=df.groupBy("stock_id").count()counts.show()===========怎么数据表只有1153194条记录,怎么运行以上代码就内存泄露:16/02/0523:30:28WARNTaskMemoryManager:leak8.3MBmemoryfromorg.apache.spark.unsafe.map.BytesToBytesMap@431395b116/02/0523:30:28ERRORExecutor:Managedmemoryleakdetected;size=8650752bytes,TID=1环境:spark-1.6.0-bin-hadoop2.6Ubuntu14.04.3LTSjdk1.8.0_66不知问题在哪?怎么破,非常感谢
解决方案
解决方案二:
counts=df.groupBy("stock_id").count()counts.show()改为写入文件:df.registerTempTable("people")count=sqlContext.sql("selectstock_id,count(*)ascfrompeoplegroupbystock_idorderbystock_id")fornameincount.collect():file_output.write(str(name))file_output.flush()file_output.close()