wormhole提升hivereader读取速度的技术方案

背景:

最近dw用户反馈wormhole传输速度很慢,有些作业甚至需要3-4个小时才能完成,会影响每天线上报表的及时推送。我看了下,基本都是从Hive到其他数据目的地,也就是使用的是hivereader,日志上也显示hivereader实时传输速度很慢,问题应该在hivereader上

先介绍下wormhole,wormhole是我们开发的一个高速数据传导工具(https://github.com/lalaguozhe/wormhole),它支持多种异构数据源,架构设计图如下:

问题描述:

每一个wormhole都是一个单机作业,用户需要填写wormhole job xml描述文件,定义好data source,data destination,还有其他一些列配置参数,然后提交job,wormhole 接受job xml文件后,会创建一个job,然后分别对reader和writer端分别进行预处理(Periphery),切分job(Splitter)。之后会起reader thread pool 和 writer thread pool 并发读取和写入数据,中间通过一个storage作为缓冲队列。

回到之前问题 hive reader中,我会将用户填写的hql,通过JDBC提交到Hive Server中,然后执行返回数据结果,这种方式有几点不好的地方

1. hql不能拆分,所以只能启动一个reader thread,发挥不了并行读取的优势

2. 我们hive server部署了两台,由于还有其他产品和查询也需要访问hive server,大规模数据拉取的话,会受限于hive server和service节点网络吞吐量

更多精彩内容:http://www.bianceng.cnhttp://www.bianceng.cn/database/extra/

3. hql提交后,mapred job会将结果数据先放入一个临时目录下,然后通过一个fetch task拉取到hive server再吐出给wormhole client,经过了datanode -> hive server -> wormhole client, 仍然瓶颈在hive server上

解决方案:

提供另一种hivereader执行mode,既然hive server的数据读取是瓶颈,那我可以绕开hive server 直接并行从datanode上读数据,而hive server的作用仅仅是提交hql. 比如用户本身的查询语句是"select * from bi.dpdm_device_permanent_city",可以自动改写成"INSERT OVERWRITE DIRECTORY 'hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67' select * from bi.dpdm_device_permanent_city",将数据insert到一个我们指定的临时目录下,注意两点

1. 开启set hive.exec.compress.output=true 压缩结果文件,进一步减少和wormhole client交互时候的网络IO

2. 用户自定义reduce数set mapred.reduce.tasks=N,由于每一个reduce生成一个文件,而hive reader是按照文件数进行切分的,所以用户可以预估数据输出量来设置reduce数

在periphery环节将hql提交给hiveserver,这时数据已经落地在不同的datanode上,然后splitter根据文件数生成等量的split list,在启动concurrency数的Reader Thread Pool,多线程并行从不同的datanode上fetch(每个线程维护一个DFSClient,会先用ClientProtocol和Namenode通信,然后直接跟datanode 读取block data) , 最后再把临时目录删除掉。

性能对比:

测试表:dpdm_device_permanent_city

一共108593390条record, HDFS_BYTES_READ: 10,149,072,324

从hiveserver上读取:

2013-07-12 12:00:30,806 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 107373504 | Write 107372736 | speed 2.89MB/s 34163L/s|  

2013-07-12 12:00:40,809 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 107695040 | Write 107694912 | speed 2.84MB/s 32192L/s|  

2013-07-12 12:00:50,812 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 108027968 | Write 108027392 | speed 2.83MB/s 33254L/s|  

2013-07-12 12:01:00,815 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 108386624 | Write 108386560 | speed 2.93MB/s 35904L/s|  

2013-07-12 12:01:09,234 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO  hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-0 to file:/data/home/yukang.chen/wormhole_hive/prefix-0
2013-07-12 12:01:09,235 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO  hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-1 to file:/data/home/yukang.chen/wormhole_hive/prefix-1
2013-07-12 12:01:09,245 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:139) INFO  core.Engine - Nebula wormhole Job is Completed successfully!
2013-07-12 12:01:09,592 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:206) INFO  core.Engine -
writer-id-0-hdfswriter:
Wormhole starts work at   : 2013-07-12 11:01:19
Wormhole ends work at     : 2013-07-12 12:01:09
Total time costs          :            3590.01s
Average byte speed        :            2.58MB/s
Average line speed        :            30248L/s
Total transferred records :           108593326

直接从datanode上读取:

2013-07-12 10:21:47,431 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:66) INFO  core.Engine - Nebula wormhole Start
2013-07-12 10:21:47,458 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:100) INFO  core.Engine - Start Reader Threads
2013-07-12 10:21:47,550 [main] com.dp.nebula.wormhole.plugins.common.DFSUtils.getConf(DFSUtils.java:112) INFO  common.DFSUtils - fs.default.name=hdfs://10.2.6.102:-1
2013-07-12 10:21:49,246 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReaderPeriphery.createTempDir(HiveReaderPeriphery.java:86) INFO  hivereader.HiveReaderPeriphery - create data temp directory successfully hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67
2013-07-12 10:21:50,685 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveJdbcClient.processInsertQuery(HiveJdbcClient.java:65) INFO  hivereader.HiveJdbcClient - hive execute insert sql:INSERT OVERWRITE DIRECTORY 'hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67' select * from bi.dpdm_device_permanent_city
2013-07-12 10:24:10,943 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveJdbcClient.printMetaDataInfoAndGetColumnCount(HiveJdbcClient.java:104) INFO  hivereader.HiveJdbcClient - selected column names:
string deviceid, int trainid, int cityid, string first_day, string last_day, double confidence_lower_bound, double confidence_upper_bound, bigint month_state
2013-07-12 10:24:11,127 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReaderSplitter.split(HiveReaderSplitter.java:69) INFO  hivereader.HiveReaderSplitter - splitted files num:44
2013-07-12 10:24:11,151 [pool-1-thread-2] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000000_0
2013-07-12 10:24:11,154 [pool-1-thread-3] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000001_0
2013-07-12 10:24:11,157 [pool-1-thread-4] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000002_0
2013-07-12 10:24:11,161 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000003_0
2013-07-12 10:24:11,164 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000004_0
2013-07-12 10:24:11,169 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000005_0
2013-07-12 10:24:11,172 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000006_0
2013-07-12 10:24:11,177 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000007_0
2013-07-12 10:24:11,181 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000008_0
2013-07-12 10:24:11,185 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000009_0
log4j:WARN No appenders could be found for logger (com.hadoop.compression.lzo.GPLNativeCodeLoader).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2013-07-12 10:24:11,296 [main] com.dp.nebula.wormhole.engine.core.ReaderManager.run(ReaderManager.java:125) INFO  core.ReaderManager - Nebula WormHole start to read data
2013-07-12 10:24:11,297 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:105) INFO  core.Engine - Start Writer Threads
2013-07-12 10:24:11,313 [main] com.dp.nebula.wormhole.plugins.common.DFSUtils.getConf(DFSUtils.java:112) INFO  common.DFSUtils - fs.default.name=file://null:-1
2013-07-12 10:24:11,450 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsDirSplitter.split(HdfsDirSplitter.java:73) INFO  hdfswriter.HdfsDirSplitter - HdfsWriter splits file to 2 sub-files .
2013-07-12 10:24:11,457 [main] com.dp.nebula.wormhole.engine.core.WriterManager.run(WriterManager.java:147) INFO  core.WriterManager - Writer: writer-id-0-hdfswriter start to write data
2013-07-12 10:24:20,481 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 5116352 | Write 5115776 | speed 43.79MB/s 512748L/s|  

2013-07-12 10:24:30,501 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 10688896 | Write 10688320 | speed 47.99MB/s 556083L/s|  

2013-07-12 10:24:40,510 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 17341248 | Write 17340672 | speed 55.84MB/s 665222L/s|  

2013-07-12 10:24:50,584 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 22791040 | Write 22789824 | speed 46.90MB/s 544902L/s|  

2013-07-12 10:24:53,507 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000010_0
2013-07-12 10:24:53,599 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:25:00,597 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 30125696 | Write 30124608 | speed 63.22MB/s 733472L/s|  

2013-07-12 10:25:08,345 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000011_0
2013-07-12 10:25:08,582 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:25:09,263 [pool-1-thread-3] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000012_0
2013-07-12 10:25:09,291 [pool-1-thread-3] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:25:10,131 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000013_0
2013-07-12 10:25:10,199 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:25:10,685 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 36688002 | Write 36687106 | speed 55.07MB/s 656237L/s|  

2013-07-12 10:25:12,262 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000014_0
2013-07-12 10:25:12,274 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:26:01,532 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 67816280 | Write 67815704 | speed 57.08MB/s 673481L/s|  

2013-07-12 10:26:03,898 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000025_0
2013-07-12 10:26:03,908 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:26:06,370 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000026_0
2013-07-12 10:26:06,415 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:26:10,864 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000027_0
2013-07-12 10:26:10,889 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:26:11,539 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 73378191 | Write 73377295 | speed 47.58MB/s 556146L/s|
2013-07-12 10:26:21,576 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:26:21,690 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 79406971 | Write 79405898 | speed 51.83MB/s 602846L/s|  

2013-07-12 10:26:29,739 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000031_0
2013-07-12 10:26:29,940 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:26:32,031 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 85765697 | Write 85764545 | speed 53.87MB/s 635847L/s|  

2013-07-12 10:26:34,598 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000032_0
2013-07-12 10:26:34,606 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:26:36,369 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000033_0
2013-07-12 10:26:36,373 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:26:38,984 [pool-1-thread-2] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000034_0
2013-07-12 10:26:38,990 [pool-1-thread-2] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:26:39,126 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000035_0
2013-07-12 10:26:39,134 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:26:42,090 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 91872401 | Write 91872209 | speed 52.52MB/s 610760L/s|  

2013-07-12 10:26:50,914 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:26:52,096 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 97049556 | Write 97048852 | speed 43.83MB/s 517657L/s|  

2013-07-12 10:26:53,283 [pool-1-thread-4] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000039_0
2013-07-12 10:26:53,304 [pool-1-thread-4] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:26:54,701 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000040_0
2013-07-12 10:26:54,709 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:27:02,163 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine -
writer-id-0-hdfswriter stat:  Read 103048760 | Write 103047800 | speed 51.35MB/s 599869L/s|  

2013-07-12 10:27:03,159 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000041_0
2013-07-12 10:27:03,170 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:27:03,266 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000042_1
2013-07-12 10:27:03,281 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:27:03,742 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000043_0
2013-07-12 10:27:03,754 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
2013-07-12 10:27:11,188 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReaderPeriphery.doPost(HiveReaderPeriphery.java:106) INFO  hivereader.HiveReaderPeriphery - hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67 has been deleted at dopost stage
2013-07-12 10:27:12,212 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO  hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-0 to file:/data/home/yukang.chen/wormhole_hive/prefix-0
2013-07-12 10:27:12,213 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO  hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-1 to file:/data/home/yukang.chen/wormhole_hive/prefix-1
2013-07-12 10:27:12,214 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:139) INFO  core.Engine - Nebula wormhole Job is Completed successfully!
2013-07-12 10:27:12,525 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:206) INFO  core.Engine -
writer-id-0-hdfswriter:
Wormhole starts work at   : 2013-07-12 10:21:47
Wormhole ends work at     : 2013-07-12 10:27:12
Total time costs          :             325.08s
Average byte speed        :           28.55MB/s
Average line speed        :           334046L/s
Total transferred records :           108593262

直接从datanode上读取平均在53MB/S,从hiveserver读取平均在3MB/S,相差18倍,如果算上加上insert into directory后多出来的stage执行时间,总体相差时间也有11倍,提升还是很明显的.

以上是小编为您精心准备的的内容,在的博客、问答、公众号、人物、课程等栏目也有的相关内容,欢迎继续使用右上角搜索按钮进行搜索java
, dp
, plugins
, core data
, engine
, 2013
, reader
, cc++11c++java
, cc++java
info
spark 读取hive、spark读取hive表数据、mapreduce读取hive表、spark读取hive数据、hive读取hdfs文件,以便于您获取更多的相关知识。

时间: 2024-10-29 19:40:51

wormhole提升hivereader读取速度的技术方案的相关文章

mysql千万级数据库插入速度和读取速度的调整记录

  (1)提高数据库插入性能中心思想:尽量将数据一次性写入到Data File和减少数据库的checkpoint 操作.这次修改了下面四个配置项: 1)将 innodb_flush_log_at_trx_commit 配置设定为0:按过往经验设定为0,插入速度会有很大提高. 0: Write the log buffer to the log file and flush the log file every second, but do nothing at transaction commi

mysql数据库插入速度和读取速度的调整记录_Mysql

(1)提高数据库插入性能中心思想:尽量将数据一次性写入到Data File和减少数据库的checkpoint 操作.这次修改了下面四个配置项: 1)将 innodb_flush_log_at_trx_commit 配置设定为0:按过往经验设定为0,插入速度会有很大提高. 0: Write the log buffer to the log file and flush the log file every second, but do nothing at transaction commit.

东芝8TB新硬盘连续读取速度提升12% MTTF达到200万小时

随着物联网设备和云计算业务的需求增加,企业的硬盘容量频频告急.为此,东芝最新发布了全新MG05系列8TB容量企业级硬盘.除了容量较MG04系列明显增加外,MG05 8TB型号还将连续读取速度提升了12%,达到了230MB/s,MTTF提高了约42%,达到200万小时. 不仅可以为用户提供更快的数据服务,而且也更加稳定,有助于企业控制维护成本. 此外,MG05 8TB硬盘还支持标准4K和512em高级格式扇区技术,完美匹配新一代服务器和存储系统. 市场研究机构IDC此前发布报告称,东芝是2015年

全方位提升网站打开速度:前端、后端、新的技术

本文讲的是全方位提升网站打开速度:前端.后端.新的技术, 这里是 我们 充分利用对于网络缓存和 NoSQL 系统的研究,做出一个可以容纳几十万通过电视宣传慕名而来的访问者的网上商城 的故事,以及我们从中学到的一切. "Shark Tank"(美国),"Dragons' Den"(英国)或" Die Höhle der Löwen(DHDL)"(德国)等电视节目为年轻初创公司供了一次在众多观众前向商业大亨推销自己产品的机会.然而,主要的好处往往不在

U盘不够快?USB2.0提速补丁提升U盘速度

随着技术水平的提高,目前USB3.0的控制芯片主板已经逐渐普及到市场.现在在各种大文件疯传的年代,USB2.0显然已经难以满足大家的需求.对于过去一直比较流行的USB2.0用户来说,无法升级主板芯片我们还可以采取另外一种办法.榨干UBS2.0的剩余价值,发挥USB2.0的余热成为本文的目的. USB2.0实际传输速度为多少? 实际上USB2.0的理论传输峰值为480Mbps,简单来说就是60MB/S的数据传输速率.但是聪明的你肯定会吐槽从来没有试过这么高的传输速率.没错,USB2.0实际使用的最

小程序技术方案探讨

微信小程序上线大半年,大部分技术原理也有文章介绍了,本文尝试从需求出发探讨微信小程序技术方案的来源,以及最近公测的支付宝小程序技术方案上的考量. 微信小程序 微信小程序的需求是让第三方开发者可以接入,可以使用微信的提供的接口去开发应用嵌入在微信里.对于这个需求,最简单的实现方案是:让外部开发者开发纯H5应用,在微信的 H5 容器里打开,容器提供微信 native 接口,就行了.在有小程序之前,已经有很多这样的业务接入,像京东购物,钱包里的各种友商大众点评/滴滴出行等,都可以认为是一个"小程序&q

三星推出硬盘960 Evo SSD 读取速度达3200MB/s

据美国知名科技网站CNET 11月16日报道,在三星850 Evo固态硬盘风靡市场后,三星在11月15日正式推出升级版固态硬盘960 Evo. 这款硬盘只适用于M2接口,相比于现在普遍流行的SATA接口,接口带宽大大提升.960 Evo 舍弃SATA接口转向M2接口的使用预示着未来M2接口将在未来进一步普及. 此外新款硬盘使用了三星的3D垂直堆叠型结构NAND闪存芯片,同时也支持高性能缓冲技术TurboWrite.TurboWrite利用了部分容量模拟SLC缓存大大写入速度.据三星官方数据,96

双倍提升硬盘传输速度的技巧

如果有两块硬盘,配合Windows XP中的动态卷功能,即可极大提升硬盘传输速度.首先,将两个硬盘中的所有分区转换为NTFS磁盘格式.接着打开"磁盘管理"窗口,分别使用右键点击两块硬盘,在弹出菜单中选择"转换到动态磁盘"命令,将两块硬盘都转换为动态磁盘. 右键点击第二块硬盘后的黑色区域,选择"新建卷"命令.在打开的对话框中,点击"下一步",出现3个选项,选择"带区卷".点击"下一步",第

如何优化网站服务器提升网站访问速度

中介交易 SEO诊断 淘宝客 云主机 技术大厅 网站运营的任何时期,网站访问速度都是至关重要的部分,它是网站友好体验中最基本的一项,如果访问体验都令人不满意,那么后期所做的营销推广模式都有可能徒劳无功,因为网络中客户的选择成本很低,加上普遍客户的耐心都不高,页面访问超过6秒客户就会选择离开,这对于一些流量本来就不高的企业网站来说无疑是雪上加霜.网站访问速度既然如此重要,今天笔者也要跟大家分享几个关于提升速度体验的方法,虽然网上有很多类似的文章和观点,但是大多数都是网站内部去解析,今天笔者要从服务