HA模式强制手动切换:IPC's epoch [X] is less than the last promised epoch [X+1]

016-11-30 21:13:14,637 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal 192.168.58.183:8485 failed to write txns 54560-54560. Will try to write to this JN again after the next log roll.

    at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:414)
    at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:442)
    at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:342)
    at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:148)
    at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)
    at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:975)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)

    at org.apache.hadoop.ipc.Client.call(Client.java:1469)
    at org.apache.hadoop.ipc.Client.call(Client.java:1400)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
    at com.sun.proxy.$Proxy10.journal(Unknown Source)
    at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:167)
    at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:385)
    at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:378)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
2016-11-30 21:13:14,800 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal 192.168.58.181:8485 failed to write txns 54560-54560. Will try to write to this JN again after the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 11 is less than the last promised epoch 12
    at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:414)
    at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:442)
    at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:342)
    at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:148)
    at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)
    at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:975)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)

    at org.apache.hadoop.ipc.Client.call(Client.java:1469)
    at org.apache.hadoop.ipc.Client.call(Client.java:1400)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
    at com.sun.proxy.$Proxy10.journal(Unknown Source)
    at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:167)
    at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:385)
    at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:378)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
2016-11-30 21:13:14,812 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal 192.168.58.182:8485 failed to write txns 54560-54560. Will try to write to this JN again after the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 11 is less than the last promised epoch 12
    at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:414)
    at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:442)

 一、错误起因

  Active NameNode日志出现异常IPC's epoch [X] is less than the last promised epoch [X+1],出现短期的双Active

  我配置的ha自动切换,但是发现STandByNameNode是active,我强制手动切换了三次,STandByNameNode就无法访问了,估计是这个问题。

 二.内部原因

  【HDFS机制】:该问题属于hdfs对于脑列的异常保护,属于正常行为,不影响业务。

  1)ZKFC1对NameNode1(Active)进行健康检查,因为长时间监控不到NN1的回复,认为该NameNode1不健康,主动释放zk中的ActiveStandbyElectorLock,此时NN1还是active(因为zkfc与NameNode1连接异常,不能将其shutdown)。

  

zkfc log:

  2014-06-16 02:11:02,720 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at namenode01/172.21.248.14:9005: Call From namenode01/1

  72.21.248.14 to namenode02:9005 failed on socket timeout exception: java.net.SocketTimeoutException: 45000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[co

nnected local=/172.21.248.14:47271 remote=namenode01/172.21.248.14:9005]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout

2014-06-16 02:12:12,825 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at namenode02/172.21.248.13:9005 standby (unable to connect)

java.net.SocketTimeoutException: Call From namenode01/172.21.248.14 to namenode02:9005 failed on socket timeout exception: java.net.SocketTimeoutException: 5000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.21.248.14:59156 remote=namenode02/172.21.248.13:9005]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout

 

  2)ZKFC2在zk中竞争到ActiveStandbyElectorLock,将NameNode2(原来的Standby)变成Active,同时会更新JN中的epoch使其+1。

  3)NameNode1(原先的Active)再次去操作JournalNode的editlog时发现自己的epoch比JN的epoch小1,促使自己重启,成为Standby NameNode。

NN1 log:

2014-08-26 12:20:59,017 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [10.1.1.107:8485, 192.10.1.208:8485,

192.10.1.209:8485], stream=QuorumOutputStream starting at txid 22795230)) org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 3 exceptions thrown: 192.10.1.208:8485: IPC's epoch 115 is less than the last promised epoch 116

三.解决方案

  可以在core-site.xml文件中修改ha.health-monitor.rpc-timeout.ms参数值,来扩大zkfc监控检查超时时间。

<property>

<name>ha.health-monitor.rpc-timeout.ms</name>

<value>180000</value>

</property>

  

 四、结束语

  最后设置成手动切换吧...其实可以通过zookeeper来找到那个是active,我先不这么做吧。在hdfs-site.xml。

  但是设置成不自动切换的话,zkfc就取法启动,hbase必须用自己的zookeeper。

时间: 2024-09-27 14:32:47

HA模式强制手动切换:IPC&#39;s epoch [X] is less than the last promised epoch [X+1]的相关文章

HA模式手动切换namenode状态

查看状态 hdfs haadmin -getServiceState nn1 有时候通过网页访问两个namenode的http-address,看到默认的主namenode状态变成了standy,这时可以通过下面命令来实现主namenode的状态切换成active. hdfs haadmin -failover -forcefence -forceactive nn2 nn1 或者 bin/hdfs haadmin -transitionToActive nn1 注意:此处"nn2  nn1&q

一脸懵逼学习Hadoop分布式集群HA模式部署(七台机器跑集群)

1)集群规划:主机名        IP      安装的软件                     运行的进程master    192.168.199.130   jdk.hadoop                     NameNode.DFSZKFailoverController(zkfc)slaver1    192.168.199.131    jdk.hadoop                       NameNode.DFSZKFailoverController(

MHA实现mysql主从数据库手动切换的方法_Mysql

本文实例讲述了MHA实现mysql主从数据库手动切换的方法,分享给大家供大家参考.具体方法如下: 一.准备工作 1.分别在Master和Slave执行如下,方便mha检查复制: 复制代码 代码如下: grant all privileges on *.* to 'root'@'10.1.1.231' identified by 'rootpass'; grant all privileges on *.* to 'root'@'10.1.1.234' identified by 'rootpas

HA模式下历史服务器配置

笔者的集群是 HA 模式的( HDFS 和 ResourceManager HA).在 " Hadoop-2.5.0-cdh5.3.2 HA 安装" 中详细讲解了关于 HA 模式的搭建,这里就不再赘述.但网上直接将关于 HA 模式下的历史服务器的配置资料却很少. 笔者在思考,如果配置在 mapred-site.xml 中就设置一台历史服务器,那么当这台机器挂了,那么能不能有另一台机器来承担历史服务器的责任,也就是笔者理想当然的 jobhistory server HA 模式.后面经过各自尝试,得

js图片轮播手动切换效果_javascript技巧

利用ScrollPicLeft.js这个库实现图片的前后切换,适用于网页中的证书展示.推荐商品之类的栏目.它不像传统的marquee滚动那样,而是可以手动的去点击前后切换箭头按钮,进行图片的翻页,从而达到浏览上一张,下一张的效果. 不需要调用jquery,初始化简单,使用非常的简单,便利. 实例效果: js代码: <script type="text/javascript"> var scrollPhoto = new ScrollPicleft(); scrollPhot

ViewPager详解(二)——自动轮播和手动切换完整示例

MainActivity如下: package cn.ww; import android.app.Activity; import android.content.Context; import android.os.Bundle; import android.os.Handler; import android.os.Message; import android.support.v4.view.ViewPager; import android.view.MotionEvent; imp

jQuery实现自动与手动切换的滚动新闻特效代码分享_jquery

本文实例讲述了jQuery实现滚动新闻特效.分享给大家供大家参考.具体如下: jQuery实现滚动新闻代码是一款基于bootstrup 3实现的响应式jQuery滚动新闻插件.有三种展示新闻的方式,自动向下循环展示,自动向上循环展示以及手动循环展示,总有一款适合你的. 运行效果图:                                -------------------查看效果 下载源码------------------- 小提示:浏览器中如果不能正常运行,可以尝试切换浏览模式.

什么是百度卫士高速模式?什么是极速模式?怎样切换?

高速模式:电脑加速的默认状态,通过优化多项系统设置和网络设置,提高系统响应速度,优化网络连接. 极速模式:电脑加速的可选项,可以与高速模式相互切换.能深度精简不必要的后台程序和服务,显著提升电脑运行速度.极速模式的优化项可以按照您的需求进行设置,例如:"关闭窗口的视觉特效"项,可以减少视觉特效对系统性能的影响,加快电脑运行速度. 高速/极速模式切换方式:打开主页后将鼠标移至"电脑加速:高速"或"电脑加速:极速"处,则会出现"高速&quo

消息称网易内部强制员工切换企业版泡泡

新浪科技讯 6月25日凌晨消息,网易内部人士向新浪科技证实,http://www.aliyun.com/zixun/aggregation/18428.html">网易公司内部已经不支持普通版泡泡(POPO)登录,内部网络只支持泡泡企业版.网易强制向员工推行使用泡泡企业版意味着网易进军IM工具企业级市场上的意图逐渐清晰. 5月28日,网易IM工具泡泡(POPO)推出企业版,网易内部员工陆续开始将原POPO帐号和企业版帐号进行绑定.网易在内部提供corp账号和原POPO账号进行绑定,并转移原