【RAC】PMON: terminating the instance due to error 481

Applies to:

Oracle Server - Enterprise Edition - Version: 11.2.0.2.0 and later   [Release: 11.2 and later ]

Information in this document applies to any platform.

Symptoms

On 11.2.0.2+ cluster, instance is running on one node, startup instance on the other node(s) fails with:

PMON (ospid: 487580): terminating the instance due to error 481

If ASM is used, +ASMn alert log shows:

Sat Oct 01 19:19:38 2011

MMNL started with pid=21, OS id=6488362

lmon registered with NM - instance number 2 (internal mem no 1)

Sat Oct 01 19:21:37 2011

PMON (ospid: 4915562): terminating the instance due to error 481

Sat Oct 01 19:21:37 2011

System state dump requested by (instance=2, sid=4915562 (PMON)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_diag_4915388.trc

Dumping diagnostic data in directory=[cdmp_20111001192138], requested by (instance=2, sid=4915562 (PMON)), summary=[abnormal instance termination].

Sat Oct 01 19:21:38 2011

License high water mark = 1

Instance terminated by PMON, pid = 4915562

+ASMn_diag_xxx.trc trace shows:

*** 2011-10-01 19:19:37.526

Reconfiguration starts [incarn=0]

*** 2011-10-01 19:19:37.526

I'm the voting node

Group reconfiguration cleanup

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

...... << repeated messages

If ASM is not used, then DB instance could fail with the same error:

Mon Jul 04 16:22:50 2011

Starting ORACLE instance (normal)

...

Mon Jul 04 16:22:54 2011

MMNL started with pid=24, OS id=667660

starting up 1 shared server(s) ...

lmon registered with NM - instance number 2 (internal mem no 1)

Mon Jul 04 16:26:15 2011

PMON (ospid: 487580): terminating the instance due to error 481

lmon trace shows:

*** 2011-07-04 16:22:59.852

=====================================================

kjxgmpoll: CGS state (0 1) start 0x4e11785e cur 0x4e117863 rcfgtm 5 sec

...

*** 2011-07-04 16:26:14.248

=====================================================

kjxgmpoll: CGS state (0 1) start 0x4e11785e cur 0x4e117926 rcfgtm 200 sec

dia0 trace shows:

*** 2011-07-04 16:22:53.414

Reconfiguration starts [incarn=0]

*** 2011-07-04 16:22:53.414

I'm the voting node

Group reconfiguration cleanup

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

...<< repeated message

Changes

This could happen during patching or after node reboot.

Cause

The problem is caused by HAIP is not ONLINE on either the running node or the problem node(s).

Basically the ASM or DB instance(s) can not startup if they use a different cluster_interconnect than the running instance.

With HAIP ONLINE, all instances (DB and ASM) should use HAIP IP address: 169.254.x.x.

If on any node HAIP is OFFLINE, the ASM and DB instance will use the native private network address which causes communication problem with the instance using HAIP.

Use the following commands to verify HAIP status, as grid user:

$ crsctl stat res -t -init

check for resource ora.cluster_interconnect.haip status.

In this example, HAIP is OFFLINE on the running node 1, hence +ASM1 is using 10.1.1.1 as cluster_interconnect, while on node 2 HAIP is ONLINE, +ASM2 is using HAIP 169.254.239.144 as cluster_interconnect, causing communication problem between them and +ASM2 can not startup.

alert_+ASM1.log shows:

Cluster communication is configured to use the following interface(s) for this instance

10.1.1.1

alert_+ASM2.log shows:

Cluster communication is configured to use the following interface(s) for this instance

169.254.239.144

Solution

The solution is to start HAIP on all nodes before start ASM or DB instance by either restart HAIP resource or restart the GI stack.

For this example, +ASM1 was started first with HAIP OFFLINE:

1. Try to start HAIP manually on node 1

as grid user:

$ crsctl start res ora.cluster_interconnect.haip -init

To verify:

$ crsctl stat res -t -init

2. If this succeeds, then restart ora.asm resource (note, this will bring down all dependent diskgroup resource and db resource):

as root user:

# crsctl stop res ora.crsd -init

# crsctl stop res ora.asm -init -f

# crsctl start res ora.asm -init

# crsctl start res ora.crsd -init

startup any dependent resource as necessary

3. If above does not help, try to restart the GI stack on node 1, see if HAIP can be ONLINE after that.

As root user:

# crsctl stop crs

# crsctl start crs

Check $GRID_HOME/log//agent/ohasd/orarootagent_root/orarootagent_root.log for any HAIP error.

4. Once HAIP is ONLINE on node 1, proceed to start ASM on the rest of cluster nodes and ensure HAIP are ONLINE on all nodes.

$ crsctl start res ora.asm -init

ASM or DB instances should be able to start on all nodes after above.

时间: 2024-12-02 01:29:29

【RAC】PMON: terminating the instance due to error 481的相关文章

【RAC】RAC中的负载均衡和故障切换--TAF配置

[RAC]RAC中的负载均衡和故障切换--TAF配置 涉及到的内容包括:   Oracle RAC 客户端连接负载均衡(Load Balance)      实现负载均衡(Load Balance)是Oracle RAC最重要的特性之一,主要是把负载平均分配到集群中的各个节点,以提高系统的整体吞吐能力.通常情况下有两种方式来实现负载均衡,一个是基于客户端连接的负载均衡,一个是基于服务器端监听器(Listener)收集到的信息来将新的连接请求分配到连接数较少实例上的实现方式.本文主要讨论的是基于客

【RAC】RAC相关基础知识

  [RAC]RAC相关基础知识 1.CRS简介    从Oracle 10G开始,oracle引进一套完整的集群管理解决方案--Cluster-Ready Services,它包括集群连通性.消息和锁.负载管理等框架.从而使得RAC可以脱离第三方集群件,当然,CRS与第三方集群件可以共同使用. (1).CRS进程 CRS主要由三部分组成,三部分都作为守护进程出现 <1>CRSD:资源可用性维护的主要引擎.它用来执行高可用性恢复及管理操作,诸如维护OCR及管理应用资源,它保存着集群的信息状态和

【RAC】将RAC备份集恢复为单实例数据库

[RAC]将RAC备份集恢复为单实例数据库 1.1  BLOG文档结构图   1.2  前言部分   1.2.1  导读 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① rac数据库的备份集是如何恢复到单实例的数据库 ② ASM文件系统到OS文件系统的转换 ③ 一般的备份恢复过程       本文如有错误或不完善的地方请大家多多指正,ITPUB留言或QQ皆可,您的批评指正是我写作的最大动力. 1.2.2  实验环境介绍   源库:1

【RAC】rac环境下的数据库备份与还原

[RAC]rac环境下的数据库备份与还原 一.1  BLOG文档结构图       一.2  前言部分   一.2.1  导读 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① rac环境下的数据库备份与还原 ② rman恢复数据库的一般步骤 ③ rac环境的简单操作   注意:本篇BLOG中代码部分需要特别关注的地方我都用黄色背景和红色字体来表示,比如下边的例子中,thread 1的最大归档日志号为33,thread 2的最大归档日

【RAC】将单实例备份集恢复为rac数据库

[RAC]将单实例备份集恢复为rac数据库 一.1  BLOG文档结构图     一.2  前言部分   一.2.1  导读 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① 单实例环境的备份集如何恢复到rac环境(重点) ② rman恢复数据库的一般步骤 ③ rac环境的简单操作   注意:本篇BLOG中代码部分需要特别关注的地方我都用黄色背景和红色字体来表示,比如下边的例子中,thread 1的最大归档日志号为33,thread

【RAC】参数CLUSTER_INTERCONNECTS

[RAC]参数CLUSTER_INTERCONNECTS CLUSTER_INTERCONNECTS参数定义一个私有网络,这个参数将影响GCS和GES服务网络接口的选择. 该参数主要用于以下目的: 1.覆盖默认的内联网络 2.单一的网络带宽不能满足RAC数据库的带宽要求,增加带宽. CLUSTER_INTERCONNECTS将信息存储在集群注册表中,明确覆盖以下内容: 1.存储在OCR中通过oifcfg命令查看的网络分类. 2.Oracle选择的默认内部连接. 该参数默认值是空,可以包含一到多个

【RAC】Oracle 11gR2 RAC 中的 Grid Plug and Play(GPnP) 是什么?

[RAC]Oracle 11gR2 RAC 中的 Grid Plug and Play(GPnP) 是什么? 一. 什么是GPnP?   Grid Plug and Play (GPnP):Foundation for a Dynamic Cluster Management    (1)GPnPeliminates the need for a per node configuration –It is an underlying gridconcept that enables the au

【RAC】 RAC For W2K8R2 安装--dbca创建数据库(七)

[RAC] RAC For W2K8R2 安装--dbca创建数据库(七) 一.1  BLOG文档结构图       一.2  前言部分   一.2.1  导读 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① RAC for windows 2008R2 的安装 ② rac环境下共享存储的规划和搭建 ③ starwind软件的应用 ④ VMware workstation 如何做共享存储 ⑤ rac数据的静默安装和dbca静默建库

【RAC】 RAC For W2K8R2 安装--安装过程中碰到的问题(九)

[RAC] RAC For W2K8R2 安装--安装过程中碰到的问题(九) 一.1  BLOG文档结构图       一.2  前言部分   一.2.1  导读 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① RAC for windows 2008R2 的安装 ② rac环境下共享存储的规划和搭建 ③ starwind软件的应用 ④ VMware workstation 如何做共享存储 ⑤ rac数据的静默安装和dbca静默建库