一、现象
Hadoop-2.7.2中,使用hadoop shell命令行读取文件内容时,针对大文件,会有如下报错,小文件则不会。
hadoop fs -cat /tmp/hue_database_dump4.json 16/09/29 15:13:37 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] java.io.IOException: Incorrect value for packet payload size: 57616164 at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102) at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201) at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119) at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201) at org.apache.hadoop.fs.shell.Command.run(Command.java:165) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) 16/09/29 15:13:37 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] java.io.IOException: Incorrect value for packet payload size: 57616164 at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102) at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201) at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119) at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201) at org.apache.hadoop.fs.shell.Command.run(Command.java:165) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) 16/09/29 15:13:37 INFO hdfs.DFSClient: Could not obtain BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 from any node: java.io.IOException: No live nodes contain block BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 after checking nodes = [DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK], DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] Dead nodes: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]. Will get new block locations from namenode and retry... 16/09/29 15:13:37 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 1086.6056359410977 msec. 16/09/29 15:13:38 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] java.io.IOException: Incorrect value for packet payload size: 57616164 at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102) at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201) at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119) at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201) at org.apache.hadoop.fs.shell.Command.run(Command.java:165) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) 16/09/29 15:13:38 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] java.io.IOException: Incorrect value for packet payload size: 57616164 at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102) at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201) at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119) at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201) at org.apache.hadoop.fs.shell.Command.run(Command.java:165) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) 16/09/29 15:13:38 INFO hdfs.DFSClient: Could not obtain BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 from any node: java.io.IOException: No live nodes contain block BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 after checking nodes = [DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK], DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] Dead nodes: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]. Will get new block locations from namenode and retry... 16/09/29 15:13:38 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 IOException, will wait for 6199.040804985275 msec. 16/09/29 15:13:45 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] java.io.IOException: Incorrect value for packet payload size: 57616164 at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102) at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201) at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119) at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201) at org.apache.hadoop.fs.shell.Command.run(Command.java:165) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) 16/09/29 15:13:45 INFO hdfs.DFSClient: Could not obtain BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 from any node: java.io.IOException: No live nodes contain block BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 after checking nodes = [DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK], DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] Dead nodes: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]. Will get new block locations from namenode and retry... 16/09/29 15:13:45 WARN hdfs.DFSClient: DFS chooseDataNode: got # 3 IOException, will wait for 13991.541436655532 msec. 16/09/29 15:13:59 WARN hdfs.DFSClient: DFS Read java.io.IOException: Incorrect value for packet payload size: 57616164 at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102) at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201) at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119) at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201) at org.apache.hadoop.fs.shell.Command.run(Command.java:165) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) cat: Incorrect value for packet payload size: 57616164
二、分析
仔细上述异常,主要得到两点信息:
1、Could not obtain BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 from any node: java.io.IOException: No live nodes contain block BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 after checking nodes = [DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK], DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] Dead nodes: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]. Will get new block locations from namenode and retry...
2、Incorrect value for packet payload size: 57616164
逐个分析,首先看第一个,通过No live nodes contain block,找到相关代码位置,如下:
/** * Get the best node from which to stream the data. * @param block LocatedBlock, containing nodes in priority order. * @param ignoredNodes Do not choose nodes in this array (may be null) * @return The DNAddrPair of the best node. * @throws IOException */ private DNAddrPair getBestNodeDNAddrPair(LocatedBlock block, Collection<DatanodeInfo> ignoredNodes) throws IOException { DatanodeInfo[] nodes = block.getLocations(); StorageType[] storageTypes = block.getStorageTypes(); DatanodeInfo chosenNode = null; StorageType storageType = null; if (nodes != null) { for (int i = 0; i < nodes.length; i++) { if (!deadNodes.containsKey(nodes[i]) && (ignoredNodes == null || !ignoredNodes.contains(nodes[i]))) { chosenNode = nodes[i]; // Storage types are ordered to correspond with nodes, so use the same // index to get storage type. if (storageTypes != null && i < storageTypes.length) { storageType = storageTypes[i]; } break; } } } if (chosenNode == null) { throw new IOException("No live nodes contain block " + block.getBlock() + " after checking nodes = " + Arrays.toString(nodes) + ", ignoredNodes = " + ignoredNodes); } final String dnAddr = chosenNode.getXferAddr(dfsClient.getConf().connectToDnViaHostname); if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("Connecting to datanode " + dnAddr); } InetSocketAddress targetAddr = NetUtils.createSocketAddr(dnAddr); return new DNAddrPair(chosenNode, targetAddr, storageType); }
但是这个异常后面的nodes = [DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK], DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]],和ignoredNodes = null,基本可以确定没有DataNode因为某些原因被排除和忽略。继续往下分析日志,得到如下关键信息:
Dead nodes: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]
两个Node都被标记为Dead了,但是通过Web界面看两个DataNode都是正常的,如下:
这就有点奇怪了。只能继续往下看第二个异常输出:Incorrect value for packet payload size: 57616164,定位源码位置在PacketReceiver的doRead()方法中,如下:
if (totalLen < 0 || totalLen > MAX_PACKET_SIZE) { throw new IOException("Incorrect value for packet payload size: " + payloadLen); }
也就是说,在接收数据包的过程中,数据包的总大小超过了阈值MAX_PACKET_SIZE,也就是16M,如下:
/** * The max size of any single packet. This prevents OOMEs when * invalid data is sent. */ private static final int MAX_PACKET_SIZE = 16 * 1024 * 1024;
而根据异常输出的57616164,这个数据包的大小达到了54M之多。由此可以想到,小文件本身就很小,数据包也不会大,而大文件数据包就比较大,会超过阈值。但是那个要读的文件,大小也就 50M,为什么会都出4M多呢?继续往下看数据包的结构,如下:
// Each packet looks like: // PLEN HLEN HEADER CHECKSUMS DATA // 32-bit 16-bit <protobuf> <variable length> // // PLEN: Payload length // = length(PLEN) + length(CHECKSUMS) + length(DATA) // This length includes its own encoded length in // the sum for historical reasons. // // HLEN: Header length // = length(HEADER) // // HEADER: the actual packet header fields, encoded in protobuf // CHECKSUMS: the crcs for the data chunk. May be missing if // checksums were not requested // DATA the actual block data Preconditions.checkState(curHeader == null || !curHeader.isLastPacketInBlock()); curPacketBuf.clear(); curPacketBuf.limit(PacketHeader.PKT_LENGTHS_LEN); doReadFully(ch, in, curPacketBuf); curPacketBuf.flip(); int payloadLen = curPacketBuf.getInt();
可以看到,这个数据包时包含Payload length,其值= length(PLEN) + length(CHECKSUMS) + length(DATA),还有Header length,其值= length(HEADER),最后是HEADER、CHECKSUMS和DATA,而除DATA外,其它的都是数据包额外添加的部分,这也就解释了为什么数据包比实际文件大小还大。
为什么数据包会如此之大呢?我们继续往下看数据包的发送过程,在BlockSender中数据包发送方法sendPacket()中,如下:
/** * Sends a packet with up to maxChunks chunks of data. * * @param pkt buffer used for writing packet data * @param maxChunks maximum number of chunks to send * @param out stream to send data to * @param transferTo use transferTo to send data * @param throttler used for throttling data transfer bandwidth */ private int sendPacket(ByteBuffer pkt, int maxChunks, OutputStream out, boolean transferTo, DataTransferThrottler throttler) throws IOException {
这个ByteBuffer pkt实际上就是存放待发送数据包数据的缓冲区,它的大小决定了发送数据包的大小,那么它的大小是如何设定的呢?继续分析,在doSendBlock()方法中,如下:
ByteBuffer pktBuf = ByteBuffer.allocate(pktBufSize); while (endOffset > offset && !Thread.currentThread().isInterrupted()) { manageOsCache(); long len = sendPacket(pktBuf, maxChunksPerPacket, streamForSendChunks, transferTo, throttler); offset += len; totalRead += len + (numberOfChunks(len) * checksumSize); seqno++; }
先根据pktBufSize申请内存,确定缓冲区ByteBuffer pktBuf,然后再sendPacket()发送数据包。很明显,我们只需要知道pktBufSize是如何确定的就行了。如下:
if (transferTo) { FileChannel fileChannel = ((FileInputStream)blockIn).getChannel(); blockInPosition = fileChannel.position(); streamForSendChunks = baseStream; maxChunksPerPacket = numberOfChunks(TRANSFERTO_BUFFER_SIZE); // Smaller packet size to only hold checksum when doing transferTo pktBufSize += checksumSize * maxChunksPerPacket; } else { maxChunksPerPacket = Math.max(1, numberOfChunks(HdfsConstants.IO_FILE_BUFFER_SIZE)); // Packet size includes both checksum and data pktBufSize += (chunkSize + checksumSize) * maxChunksPerPacket; }
这个pktBufSize与maxChunksPerPacket相关,而maxChunksPerPacket的大小确定方法如下:
maxChunksPerPacket = Math.max(1, numberOfChunks(HdfsConstants.IO_FILE_BUFFER_SIZE));
也就是和参数IO_FILE_BUFFER_SIZE,即io.file.buffer.size有关,默认为4096,即4KB,如下:
public static final String IO_FILE_BUFFER_SIZE_KEY = "io.file.buffer.size"; /** Default value for IO_FILE_BUFFER_SIZE_KEY */ public static final int IO_FILE_BUFFER_SIZE_DEFAULT = 4096;
而实际查看集群中的配置,如下:
足有125M......这也就是为什么数据包会如此之大的原因。
修改参数,重启节点后,读取数据正常。
三、答案
参数io.file.buffer.size配置过大,导致数据包发送超过数据包接收时设定的阈值。