hbase集群写不进去数据的问题追踪过程

hbase从集群中有8台regionserver服务器,已稳定运行了5个多月,8月15号,发现集群中4个datanode进程死了,经查原因是内存 outofMemory了(因为这几台机器上部署了spark,给spark开的-Xmx是32g),然后对从集群进行了恢复并进行了补数据,写负载比较 重,又运行了几天,发现从集群写不进去数据了

①、regionserver端            

            regionserver端现象一、

2014-08-21 15:03:31,011 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region table_version,hour_search_860010-1118000000_2014010418,1403685954922.640fc829f767a4e33e296fc4f4cca4a4. after a delay of 13125
2014-08-21 15:03:31,011 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region hour_hotstatic,860010-0507010000_2014071711_0_entry_00000008749,1406860400351.bcb13556daad6bda72b3c84df5ec912e. after a delay of 10066
2014-08-21 15:03:31,011 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region hour_screen,860010-2288050100_2014030419_0_00000000920,1402321410433.da4ff8fe84325e7da075b0fba1f3c3c9. after a delay of 11767
2014-08-21 15:03:31,011 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region hour_hotstatic,860010-1119060300_2014040422_0_bounce_ratio_00000000867,1402022490696.4fcfd303cff4211de61ff55f77d46317. after a delay of 10256
2014-08-21 15:03:31,011 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region hour_url,860010-0204020100_2014010607_0_8c54e33efae9da957548659c5b96f04e,1403329534827.b1c3733f5a8deade785bd71ee8660268. after a delay of 16628
2014-08-21 15:03:31,011 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region hour_hotstatic,860010-0335010000_2014041011_0_exit_00000000000,1399606854480.b1f83e693e0fdb18e168943d282cb6b0. after a delay of 18889
2014-08-21 15:03:31,011 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region hour_main,860010-2014041100_2014060513,1402472695828.c3cd5c3a1fcc01e0493a8043e376e948. after a delay of 21727
2014-08-21 15:03:31,011 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region hour_screen,,1396924866983.e3f0096984896efa77348dc4f89a9f3c. after a delay of 17782
2014-08-21 15:03:31,011 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region hour_area,860010-2316230100_2014031222_0_pv_00000000005,1395829898129.c426c025521dd8facd291f1a8ba15f13. after a delay of 6147
2014-08-21 15:03:31,011 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region hour_stay,860010-0604100000_2014031918_0_00000000006,1395349588239.e592ebe99f412b565381f6649bbf857f. after a delay of 16294
2014-08-21 15:03:31,011 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region hour_hotstatic,860010-0307010000_2014070100_0_entry_00000001023,1405881888126.055c3c19009c6822e00def0b7431d0d8. after a delay of 20105
2014-08-21 15:03:31,011 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region hour_hotstatic,860010-0506000000_2014072817_0_bounce_ratio_00000047803,1407729791396.22b0d3234c1173859992d231d2f2d427. after a delay of 7105
2014-08-21 15:03:31,011 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region hour_stay,860010-2328010100_2014010616_0_00000000011,1401896532036.547015d92a9021e31bac69909979f4ac. after a delay of 5485
2014-08-21 15:03:31,011 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region hour_flash,860010-0521010000_2014030620_0_00000000007,1407471178069.aa4f5e7e7f8e3dd150666ae1205ebbcf. after a delay of 11484

        regionserver端现象二、

2014-08-21 10:30:43,384 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=79, maxlogs=32; forcing flush of 1 regions(s): 12663e173854886463edfe8c6495dca0
2014-08-21 10:31:53,456 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=65, maxlogs=32; forcing flush of 9 regions(s): 192e3fcd5afce28ea2abc8bbb895163d, 2149c6216b259083a6743c61ec7f62b1, 214aac4a7f31cfc346889aabdbdbadd3, 2248c5c76b0fd55fe11d428a77330e6b, 2f5d56a3c17fd8e4f6f6f62d0fbcda69, 2ff390bdbb79cb8dc8ba05b4e56c26ea, 398376b87a43d83d84e96169dadb7865, b5431ef4a70fb2a244d83ae3316506f9, f34c16e000e648988bc00692bc6c7cea
2014-08-21 10:33:25,657 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=66, maxlogs=32; forcing flush of 4 regions(s): 192e3fcd5afce28ea2abc8bbb895163d, 2f5d56a3c17fd8e4f6f6f62d0fbcda69, b5431ef4a70fb2a244d83ae3316506f9, f34c16e000e648988bc00692bc6c7cea
2014-08-21 10:33:55,418 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=60, maxlogs=32; forcing flush of 4 regions(s): 352e2b4a2a42438d5ecb735de1c9e9f4, 5d08d2713d809334514be9ec7e2512cb, 981285a02ae3af797b10e621e76eccf8, f9a55c4661a1ee2f16e3c1e6ec978595
2014-08-21 10:35:02,013 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=51, maxlogs=32; forcing flush of 3 regions(s): a6064be87ca7005a4e4ab607501d9f5a, cc84289443f2478105bd8078df2bccd3, f533780eb2913bf8819cecea52bbeb43
2014-08-21 10:39:05,129 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=35, maxlogs=32; forcing flush of 1 regions(s): 5b0d0af8b9b684237373e941238bdfa2
2014-08-21 11:34:41,619 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=33, maxlogs=32; forcing flush of 1 regions(s): 2149c6216b259083a6743c61ec7f62b1
2014-08-21 11:36:53,437 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=33, maxlogs=32; forcing flush of 1 regions(s): eec50ffaa2639f7c0fbd7ac727c16f16
2014-08-21 11:37:46,667 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=34, maxlogs=32; forcing flush of 1 regions(s): eec50ffaa2639f7c0fbd7ac727c16f16
2014-08-21 11:38:09,366 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=35, maxlogs=32; forcing flush of 1 regions(s): eec50ffaa2639f7c0fbd7ac727c16f16
2014-08-21 11:38:57,140 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=35, maxlogs=32; forcing flush of 15 regions(s): 0c223074833c6a3e2835feb5f9640298, 0f461ff6911b932c013e8d5f57d110d9, 2846b752106aa8079f49e784666c17a8, 53e7a57b2028e32e90040071014b13be, 5f2053770878cfc4ae4e1849f3e128b8, 66fd00187ab38d3253fd2b440ea1a082, 6e3c2282edaebdb1bda15d49fe22df6f, 7e45f8f49ff6b697dc36d988f15a1643, a625182cd59e5ae87ead3113b3a89aaa, b77403d41440cda21e92e4d20d1dc4bc, ba2bdc3cdc3a748c5fbc4d19cdda1bbf, bab28f8f990d3aed73a982964f5731f9, e8c5bd8150ee49d0ba13ee77633d1936, f5064874556aca3c45a67463b2ad37d5, f9961ca861361ab0913f6e05571d45b5
2014-08-21 11:40:02,163 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=36, maxlogs=32; forcing flush of 15 regions(s): 0c223074833c6a3e2835feb5f9640298, 0f461ff6911b932c013e8d5f57d110d9, 2846b752106aa8079f49e784666c17a8, 53e7a57b2028e32e90040071014b13be, 5f2053770878cfc4ae4e1849f3e128b8, 66fd00187ab38d3253fd2b440ea1a082, 6e3c2282edaebdb1bda15d49fe22df6f, 7e45f8f49ff6b697dc36d988f15a1643, a625182cd59e5ae87ead3113b3a89aaa, b77403d41440cda21e92e4d20d1dc4bc, ba2bdc3cdc3a748c5fbc4d19cdda1bbf, bab28f8f990d3aed73a982964f5731f9, e8c5bd8150ee49d0ba13ee77633d1936, f5064874556aca3c45a67463b2ad37d5, f9961ca861361ab0913f6e05571d45b5
2014-08-21 11:40:47,301 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=37, maxlogs=32; forcing flush of 14 regions(s): 0c223074833c6a3e2835feb5f9640298, 0f461ff6911b932c013e8d5f57d110d9, 2846b752106aa8079f49e784666c17a8, 53e7a57b2028e32e90040071014b13be, 5f2053770878cfc4ae4e1849f3e128b8, 66fd00187ab38d3253fd2b440ea1a082, 6e3c2282edaebdb1bda15d49fe22df6f, a625182cd59e5ae87ead3113b3a89aaa, b77403d41440cda21e92e4d20d1dc4bc, ba2bdc3cdc3a748c5fbc4d19cdda1bbf, bab28f8f990d3aed73a982964f5731f9, e8c5bd8150ee49d0ba13ee77633d1936, f5064874556aca3c45a67463b2ad37d5, f9961ca861361ab0913f6e05571d45b5
2014-08-21 11:41:23,446 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=37, maxlogs=32; forcing flush of 17 regions(s): 12663e173854886463edfe8c6495dca0, 25bc0f41f28710d047c7e3775f388e39, 2f5d56a3c17fd8e4f6f6f62d0fbcda69, 3619ffc85d19102863eafe36e6d3acf8, 3b4f4f57abec73084a22bd7392247d86, 42e4757fce922723831d29326540b177, 6c53f4fb301af91f54f0d1590a7c856f, a2e173875e2287bd9ac74b9cdd289fde, c02ca04051d2684b3138662803892dd3, cd6158fa98bf85d39118e450c454e93a, d75e31ed4e06b867652a70160cd90c71, e024920c26c08afe5004f5ae51f63d35, f34c16e000e648988bc00692bc6c7cea, f378e07ac843beb2becc57e79af0362a, f49dba00bbb0c359935146ffa52bdc70, f9a55c4661a1ee2f16e3c1e6ec978595, ff82c095987dc2f6becc66cd777c7970

2014-08-21 11:42:02,502 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=38, maxlogs=32; forcing flush of 17 regions(s): 12663e173854886463edfe8c6495dca0, 25bc0f41f28710d047c7e3775f388e39, 2f5d56a3c17fd8e4f6f6f62d0fbcda69, 3619ffc85d19102863eafe36e6d3acf8, 3b4f4f57abec73084a22bd7392247d86, 42e4757fce922723831d29326540b177, 6c53f4fb301af91f54f0d1590a7c856f, a2e173875e2287bd9ac74b9cdd289fde, c02ca04051d2684b3138662803892dd3, cd6158fa98bf85d39118e450c454e93a, d75e31ed4e06b867652a70160cd90c71, e024920c26c08afe5004f5ae51f63d35, f34c16e000e648988bc00692bc6c7cea, f378e07ac843beb2becc57e79af0362a, f49dba00bbb0c359935146ffa52bdc70, f9a55c4661a1ee2f16e3c1e6ec978595, ff82c095987dc2f6becc66cd777c7970

        

        regionserver端现象三(这个已经通过hdfs端和hbase端,配置同样的dfs.socket.timeout=900000修复):

2014-08-23 11:19:17,598 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block blk_-6884116396095947381_111959717java.net.SocketTimeoutException: 66000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.130.136.114:53194 remote=/10.130.136.114:50010]
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.DataInputStream.readFully(DataInputStream.java:195)
        at java.io.DataInputStream.readLong(DataInputStream.java:416)
        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3127)

2014-08-23 11:19:17,599 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-4289533060700867612_111959745 bad datanode[0] 10.130.136.114:50010
2014-08-23 11:19:17,599 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-6884116396095947381_111959717 bad datanode[0] 10.130.136.114:50010
2014-08-23 11:19:17,599 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-4289533060700867612_111959745 in pipeline 10.130.136.114:50010, 10.130.136.115:50010: bad datanode 10.130.136.114:50010
2014-08-23 11:19:17,599 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-6884116396095947381_111959717 in pipeline 10.130.136.114:50010, 10.130.136.115:50010: bad datanode 10.130.136.114:50010
2014-08-23 11:22:27,624 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=681.33 MB, free=3.32 GB, max=3.99 GB, blocks=10035, accesses=44791415, hits=40264747, hitRatio=89.89%, , cachingAccesses=40274782, cachingHits=40264747, cachingHitsRatio=99.97%, , evictions=0, evicted=0, evictedPerRun=NaN

②.datanode端

        同时发现hdfs datanode里出现很多异常:

        datanode异常1:

        

        java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.130.136.114:50010 remote=/10.130.136.114:59516]

java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.130.136.114:50010 remote=/10.130.136.114:59524]
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.130.136.114:50010 remote=/10.130.136.114:59520]
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.130.136.114:50010 remote=/10.130.136.114:59524]
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.130.136.114:50010 remote=/10.130.136.114:59520]
2014-08-23 21:26:25,292 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-3011273698174656346_113017023 received exception org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_-3011273698174656346_113017023 is valid, and cannot be written to.
org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_-3011273698174656346_113017023 is valid, and cannot be written to.

   datanode异常2:

2014-08-23 23:06:56,413 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.130.136.114:50010 java.io.IOException: Bad connect ack with firstBadLink as 10.130.136.119:50010
2014-08-23 23:06:56,895 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.130.136.114:50010 java.io.IOException: Bad connect ack with firstBadLink as 10.130.136.119:50010
2014-08-23 23:06:57,399 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.130.136.114:50010 java.io.IOException: Bad connect ack with firstBadLink as 10.130.136.119:50010
2014-08-23 23:06:57,548 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.130.136.114:50010 java.io.IOException: Bad connect ack with firstBadLink as 10.130.136.119:50010
2014-08-23 23:06:57,935 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.130.136.114:50010 java.io.IOException: Bad connect ack with firstBadLink as 10.130.136.119:50010

  datanode异常3:

2014-08-24 22:15:21,714 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing datanode Command
java.io.IOException: Error in deleting blocks.
        at org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:1967)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:1181)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:1143)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:980)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1527)
        at java.lang.Thread.run(Thread.java:724)

  datanode异常4:

2014-08-24 16:45:35,855 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_2324951138767077684_113876340 received exception org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_2324951138767077684_113876340 is valid, and cannot be written to.
org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_2324951138767077684_113876340 is valid, and cannot be written to.
2014-08-24 16:45:42,861 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_2305069720503912789_113876452 received exception org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_2305069720503912789_113876452 is valid, and cannot be written to.
org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_2305069720503912789_113876452 is valid, and cannot be written to.
2014-08-24 16:45:43,713 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-318311590422520941_113876153 received exception org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_-318311590422520941_113876153 is valid, and cannot be written to.
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.130.136.118:50010 remote=/10.130.136.116:34363]  (注:把dfs.datanode.socket.write.timeout=1800000,然后抛1800000 millis timeout while waiting for channel to be ready for write)

java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.130.136.118:50010 remote=/10.130.136.118:55147]
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.130.136.118:50010 remote=/10.130.136.118:55147]

③.namenode端

   namenode里出现大量如下日志,(现在每天的INFO级别以上的日志达到400多G,以前日志量很少):

2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-707612696772368160 to 10.130.136.116:50010 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_8944996150588918994_62583982 on 10.130.136.116:50010 size 496 does not belong to any file. 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_8944996150588918994 to 10.130.136.116:50010 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_962585261283706817_105572114 on 10.130.136.116:50010 size 496 does not belong to any file. 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_962585261283706817 to 10.130.136.116:50010 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_-1886285939257877420_33867512 on 10.130.136.116:50010 size 496 does not belong to any file. 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-1886285939257877420 to 10.130.136.116:50010 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_-405662021725661377_23563134 on 10.130.136.116:50010 size 496 does not belong to any file. 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-405662021725661377 to 10.130.136.116:50010 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_-6831374360596453862_49890202 on 10.130.136.116:50010 size 496 does not belong to any file. 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-6831374360596453862 to 10.130.136.116:50010 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_-1458260851950313618_92180801 on 10.130.136.116:50010 size 496 does not belong to any file. 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-1458260851950313618 to 10.130.136.116:50010 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_2754038012732967699_52183933 on 10.130.136.116:50010 size 496 does not belong to any file. 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_2754038012732967699 to 10.130.136.116:50010 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_-1651824977329564981_102396163 on 10.130.136.116:50010 size 496 does not belong to any file. 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-1651824977329564981 to 10.130.136.116:50010^C 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_-8075220412997159517_101639855 on 10.130.136.116:50010 size 496 does not belong to any file. 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-8075220412997159517 to 10.130.136.116:50010 
2014-08-25 11:30:01,418 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_2245696672665686485_98393215 on 10.130.136.116:50010 size 496 does not belong to any file.

时间: 2024-10-25 16:03:42

hbase集群写不进去数据的问题追踪过程的相关文章

hadoop hbase集群断电数据块被破坏无法启动

集群机器意外断电重启,导致hbase 无法正常启动,抛出reflect invocation异常,可能是正在执行的插入或合并等操作进行到一半时中断,导致部分数据文件不完整格式不正确或在hdfs上block块不完整. 在网上查了一下相关资料,怀疑有可能是关闭前一些未提交的修改所存放的log文件数据写入一半文件不完整,故把hbase.hlog.split.skip.errors改成true进行尝试. 关于这个参数作用的解释: 当服务器奔溃,重启的时候,会有个回放的过程,把/hbase/WAL/下面记

玩转大数据-如何搭建Hbase集群

小编的上一篇文章说了如何搭建hadoop集群,我们的目的还是为了去最终搭建一个成功的Hbase集群,不说太多废话,我们直接上教程. 本文只给出如何搭建Hbase集群的方法.但是Hbase最重要的一个步骤其实是性能调优,Hbase调优之路很漫长,还请各位慢慢研究 重要的事情说三遍: 在此之前确认你的Hadoop集群已经运行起来! 在此之前确认你的Hadoop集群已经运行起来! 在此之前确认你的Hadoop集群已经运行起来! 我们就搭建一套全分布式的Hbase数据库系统(以下步骤和互联网上雷同,但某

HBase集群出现NotServingRegionException问题的排查及解决方法

HBase集群在读写过程中,可能由于Region Split或Region Blance等导致Region的短暂下线,此时客户端与HBase集群进行RPC操作时会抛出NotServingRegionException异常,从而导致读写操作失败.这里根据实际项目经验,详细描述这一问题的发现及排查解决过程. 1. 发现问题 在对HBase集群进行压力测试过程中发现,当实际写入HBase和从HBase查询的量是平时的若干倍时(集群规模10~20台,每秒读写数据量在几十万条记录的量级),导致集群的读写出

高可用Hadoop平台-HBase集群搭建

1.概述 今天补充一篇HBase集群的搭建,这个是高可用系列遗漏的一篇博客,今天抽时间补上,今天给大家介绍的主要内容目录如下所示: 基础软件的准备 HBase介绍 HBase集群搭建 单点问题验证 截图预览 那么,接下来我们开始今天的HBase集群搭建学习. 2.基础软件的准备 由于HBase的数据是存放在HDFS上的,所以我们在使用HBase时,确保Hadoop集群已搭建完成,并运行良好.若是为搭建Hadoop集群,请参考我写的<配置高可用的Hadoop平台>来完成Hadoop平台的搭建.另

我为什么建议自建HBase集群应该迁移过来?

引言 最近云HBase商业化了,HBase在业界应用还是比较广泛.在云上环境下中,不少客户都自建了HBase集群,还有一部分用户是把HBase集群放在Hadoop离线集群内部.此文主要对比下云HBase数据库跟自建HBase的差异.另外,在成本上,云HBase数据库跟自建基本差不多,目前云HBase在推广打折阶段,比自建还便宜不少 自建HBase与ApsaraDB HBase对比 自建目前在云上,基本是基于ecs去自己构建,ApsaraDB HBase我们还是做了不少事情的: ApsaraDB

HBase集群管理

通过之前文章的描述,我们已经有能力设计并部署搭建HBase集群了 当我们的HBase集群开始运行的时候,新的挑战又来了 例如,我们可能会遇到在集群运行的时候添加或者删除节点 又或者需要拷贝/备份整个集群的数据等等 如何在集群运行的时候以最小的代价来执行这些操作呢? 下面总结一下HBase集群的相关运维和管理知识点 运维任务 添加/删除节点 在HBase中动态添加/删除节点非常简单,只需要一些命令操作即可,HBase会自动帮你处理节点上下线需要做的事情 添加节点 1.修改conf目录下的regio

青云QingCloud推出HBase集群服务 支持SQL等高级功能

为了更好地满足用户对大数据基础平台的需求,企业级基础云服务商青云QingCloud(qingcloud.com)日前宣布正式推出HBase集群服务,包含HBase数据库服务.HDFS分布式文件系统.Phoenix查询引擎三大组件.在原生HBase的基础上,QingCloud在配置的易用性.监控告警.在线伸缩等方面进行全面优化,并支持二级索引.SQL和JDBC API,以及完全ACID事务等高级功能,用户能够在2-3分钟内创建一个HBase集群,并能够在控制台直接修改配置文件并应用,极大地减轻了H

HBase 集群监控

为什么需要监控? 为了保证系统的稳定性,可靠性,可运维性. 掌控集群的核心性能指标,了解集群的性能表现. 集群出现问题时及时报警,便于运维同学及时修复问题. 集群重要指标值异常时进行预警,将问题扼杀在摇篮中,不用等集群真正不可用时才采取行动. 当集群出现问题时,监控系统可以帮助我们更快的定位问题和解决问题 如何构建 HBase 集群监控系统? 公司有自己的监控系统,我们所要做的就是将 HBase 中我们关心的指标项发送到监控系统去,问题就转换为我们开发,采集并返回哪些 HBase 集群监控指标项

E-MapReduce的HBase集群间迁移

HBase集群间数据迁移 0. 前置 HBase集群 HDFS Cluster-A hdfs:/A Cluster-B hdfs:/B Cluster-A集群数据迁移到Cluster-B 1. Export/Import 将Cluster-A中HBase表export到Cluster-B的HDFS中,然后在Cluster-B中使用import导入HBase a) Cluster-A和Cluster-B网络通 Cluster-B中建好相关迁移的表 hbase(main):001:0>create