How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems [ID 1062

 

How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems [ID 1062983.1]

--------------------------------------------------------------------------------
 
  修改时间 12-FEB-2012     类型 HOWTO     状态 PUBLISHED  

In this Document
  Goal
  Solution
  References

 

--------------------------------------------------------------------------------

 

Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1.0 to 11.2.0.2 - Release: 11.2 to 11.2
Information in this document applies to any platform.

Goal
It is not possible to directly restore a manual or automatic OCR backup if the OCR is located in an ASM disk group. This is caused by the fact that the command 'ocrconfig -restore' requires ASM to be up & running in order to restore an OCR backup to an ASM
disk group. However, for ASM to be available, the CRS stack must have been successfully started. For the restore to succeed, the OCR also must not be in use (r/w), i.e. no CRS daemon must be running while the OCR is being restored.

A description of the general procedure to restore the OCR can be found in the  documentation, this document explains how to recover from a complete loss of the ASM disk group that held the OCR and Voting files in a 11gR2 Grid environment.

Solution
When using an ASM disk group for CRS there are typically 3 different types of files located in the disk group that potentially need to be restored/recreated:

•the Oracle Cluster Registry file (OCR)
•the Voting file(s)
•the shared SPFILE for the ASM instances

The following example assumes that the OCR was located in a single disk group used exclusively for CRS. The disk group has just one disk using external redundancy.

Since the CRS disk group has been lost the CRS stack will not be available on any node.

The following settings used in the example would need to be replaced according to the actual configuration:

GRID user:                       oragrid
GRID home:                       /u01/app/11.2.0/grid ($CRS_HOME)
ASM disk group name for OCR:     CRS
ASM/ASMLIB disk name:            ASMD40
Linux device name for ASM disk:  /dev/sdh1
Cluster name:                    rac_cluster1
Nodes:                           racnode1, racnode2

 

This document assumes that the name of the OCR diskgroup remains unchanged, however there may be a need to use a different diskgroup name, in which case the name of the OCR diskgroup would have to be modified in /etc/oracle/ocr.loc across all nodes prior
to executing the following steps.

1. Locate the latest automatic OCR backup

When using a non-shared CRS home, automatic OCR backups can be located on any node of the cluster, consequently all nodes need to be checked for the most recent backup:

$ ls -lrt $CRS_HOME/cdata/rac_cluster1/
-rw------- 1 root root 7331840 Mar 10 18:52 week.ocr
-rw------- 1 root root 7651328 Mar 26 01:33 week_.ocr
-rw------- 1 root root 7651328 Mar 29 01:33 day.ocr
-rw------- 1 root root 7651328 Mar 30 01:33 day_.ocr
-rw------- 1 root root 7651328 Mar 30 01:33 backup02.ocr
-rw------- 1 root root 7651328 Mar 30 05:33 backup01.ocr
-rw------- 1 root root 7651328 Mar 30 09:33 backup00.ocr

2. Make sure the Grid Infrastructure is shutdown on all nodes

Given that the OCR diskgroup is missing, the GI stack will not be functional on any node, however there may still be various daemon processes running.  On each node shutdown the GI stack using the force (-f) option:
# $CRS_HOME/bin/crsctl stop crs -f

3. Start the CRS stack in exclusive mode

On the node that has the most recent OCR backup, log on as root and start CRS in exclusive mode, this mode will allow ASM to start & stay up without the presence of a Voting disk and without the CRS daemon process (crsd.bin) running.

11.2.0.1:
# $CRS_HOME/bin/crsctl start crs -excl
...
CRS-2672: Attempting to start 'ora.asm' on 'racnode1'
CRS-2676: Start of 'ora.asm' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'racnode1'
CRS-2676: Start of 'ora.crsd' on 'racnode1' succeeded

Please note:
This document assumes that the CRS diskgroup was completely lost, in which  case the CRS daemon (resource ora.crsd) will terminate again due to the inaccessibility of the OCR - even if above message indicates that the start succeeded.

If this is not the case - i.e. if the CRS diskgroup is still present (but corrupt or incorrect) the CRS daemon needs to be shutdown manually using:
# $CRS_HOME/bin/crsctl stop res ora.crsd -init
otherwise the subsequent OCR restore will fail.

11.2.0.2:
# $CRS_HOME/bin/crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
...
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'auw2k3'
CRS-2672: Attempting to start 'ora.ctssd' on 'racnode1'
CRS-2676: Start of 'ora.drivers.acfs' on 'racnode1' succeeded
CRS-2676: Start of 'ora.ctssd' on 'racnode1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'racnode1'
CRS-2676: Start of 'ora.asm' on 'racnode1' succeeded

IMPORTANT:
A new option '-nocrs' has been introduced with  11.2.0.2, which prevents the start of the ora.crsd resource. It is vital that this option is specified, otherwise the failure to start the ora.crsd resource will tear down ora.cluster_interconnect.haip, which
in turn will cause ASM to crash.

 

4. Label the CRS disk for ASMLIB use

If using ASMLIB the disk to be used for the CRS disk group needs to stamped first, as user root do:
# /usr/sbin/oracleasm createdisk ASMD40 /dev/sdh1
Writing disk header: done
Instantiating disk: done

5. Create the CRS diskgroup via sqlplus

The disk group can now be (re-)created via sqlplus from the grid user. The compatible.asm attribute must be set to 11.2 in order for the disk group to be used by CRS:

$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.1.0 Production on Tue Mar 30 11:47:24 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Real Application Clusters and Automatic Storage Management options

SQL> create diskgroup CRS external redundancy disk 'ORCL:ASMD40' attribute 'COMPATIBLE.ASM' = '11.2';

Diskgroup created.
SQL> exit

6. Restore the latest OCR backup

Now that the CRS disk group is created & mounted the OCR can be restored - must be done as the root user:
# cd $CRS_HOME/cdata/rac_cluster1/
# $CRS_HOME/bin/ocrconfig -restore backup00.ocr

7. Start the CRS daemon on the current node (11.2.0.1 only !)

Now that the OCR has been restored the CRS daemon can be started, this is needed to recreate the Voting file. Skip this step for 11.2.0.2.0.
# $CRS_HOME/bin/crsctl start res ora.crsd -init
CRS-2672: Attempting to start 'ora.crsd' on 'racnode1'
CRS-2676: Start of 'ora.crsd' on 'racnode1' succeeded

8. Recreate the Voting file

The Voting file needs to be initialized in the CRS disk group:
# $CRS_HOME/bin/crsctl replace votedisk +CRS
Successful addition of voting disk 00caa5b9c0f54f3abf5bd2a2609f09a9.
Successfully replaced voting disk group with +CRS.
CRS-4266: Voting file(s) successfully replaced

9. Recreate the SPFILE for ASM (optional)

 

Please note:

If you are
- not using an  SPFILE for ASM
- not using a shared SPFILE for ASM
- using a shared SPFILE not stored in ASM (e.g. on cluster file system)
this step possibly should be skipped.

Also use extra care in regards to the asm_diskstring parameter as it impacts the discovery of the voting disks.

Please verify the previous settings using the ASM alert log.

Prepare a pfile (e.g. /tmp/asm_pfile.ora) with the ASM startup parameters - these may vary from the example below. If in doubt consult the ASM alert log  as the ASM instance startup should list all non-default parameter values. Please note the last startup
of ASM (in step 2 via CRS start) will not have used an SPFILE, so a startup prior to the loss of the CRS disk group would need to be located.
*.asm_power_limit=1
*.diagnostic_dest='/u01/app/oragrid'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'Now the SPFILE can be created using this PFILE:
$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.1.0 Production on Tue Mar 30 11:52:39 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Real Application Clusters and Automatic Storage Management options

SQL> create spfile='+CRS' from pfile='/tmp/asm_pfile.ora';

File created.
SQL> exit

10. Shutdown CRS

Since CRS is running in exclusive mode, it needs to be shutdown  to allow CRS to run on all nodes again. Use of the force (-f) option may be required:
# $CRS_HOME/bin/crsctl stop crs -f
...
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'auw2k3' has completed
CRS-4133: Oracle High Availability Services has been stopped.

11. Rescan ASM disks

If using ASMLIB rescan all ASM disks on each node as the root user:
# /usr/sbin/oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "ASMD40"

12. Start CRS
As the root user submit the CRS startup on all cluster nodes:
# $CRS_HOME/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

13. Verify CRS

To verify that CRS is fully functional again:
# $CRS_HOME/bin/crsctl check cluster -all
**************************************************************
racnode1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
racnode2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

# $CRS_HOME/bin/crsctl status resource -t
...

References

 相关内容

 

--------------------------------------------------------------------------------
产品
--------------------------------------------------------------------------------

•Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Enterprise Edition

关键字
--------------------------------------------------------------------------------
11GR2; ASM; CRS; DISKGROUP; OCR; RESTORE
错误
--------------------------------------------------------------------------------
CRS-4537; CRS-2672; CRS-4266; CRS-4529; CRS-2676; CRS-4533; CRS-2793; CRS-4133; CRS-4123

 

 

时间: 2024-10-07 19:44:10

How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems [ID 1062的相关文章

【MOS】OCR/Vote disk 维护操作: (添加/删除/替换/移动) (文档 ID 1674859.1)

[MOS]OCR/Vote disk 维护操作: (添加/删除/替换/移动) (文档 ID 1674859.1) 文档内容 目标 解决方案   准备磁盘   1. 磁盘大小   2. 裸设备或者块设备 (pre 11.2)   3. ASM disks (11.2+)   4. 集群文件系统   5. 权限   6. 冗余   添加/删除/替换/移动 OCR device   1. 当只有一个 OCR 设备时,添加一个 OCRMIRROR 设备:   2. 删除一个 OCR 设备   3. 替换

Solve Linux & Unix Systems Hard Disk Problems

8 Tips to Solve Linux & Unix Systems Hard Disk Problems Like Disk Full Or Can't Write to the Disk by NIXCRAFT on OCTOBER 29, 2014 · 5 COMMENTS· LAST UPDATED DECEMBER 9, 2014 in DATACENTER, HARDWARE, STORAGE Can't write to the hard disk on a Linux or

[Oracle-> MySQL] Oracle通过dblink连接MySQL

[Oracle -> MySQL]  Oracle通过dblink连接MySQL  业务上有这么一个需求,需要把Oracle的一些数据同步到MySQL,如果每次都是手动同步的话,实在太麻烦,因此花了点时间研究了下Oracle直连MySQL的方式. 参考文档:Detailed Overview of Connecting Oracle to MySQL Using DG4ODBC Database Link (Doc ID 1320645.1) 版本信息: Oracle: 11.2.0.1.0  

ORACLE 12C RAC修改ocr/votedisk/asm spfile所在磁盘组名称

今天看着我这个单节点的12C rac,突然觉得ocr所在的磁盘组叫做+DG_SYS有点不舒服,想改成+SYS_DG.处理方法是先把ocr/votedisk/asm spfile迁移到已经存在的asm中,然后修改磁盘组名称,最后迁移到新名称磁盘组中(本次处理流程+DG_SYS->+DATA->+SYS_DG) 当前运行情况 [grid@xifenfei ~]$ crsctl status res -t -----------------------------------------------

10G RAC RAW+ASM rhel-server-5.5-x86_64

由于学校老师只讲了11g的RAC安装,所以想自己试试,中间出了很多错误,借鉴了很多前辈写的文档,无抄袭之意,仅为自己学习所整理,可能有很多错误,欢迎指正 这里我会提供我安装过程中所需要的所有安装包和光盘镜像等,省的大家跟我似的苦逼呵呵找半天虚拟机镜像: http://pan.baidu.com/s/1dDvNcopASM软件:http://pan.baidu.com/s/1hGbz410g linux x86_64 clusterware:http://pan.baidu.com/s/1hqHv

oracle 10g RAC如何恢复OCR

----查询OCR状态: # /oracle/product/10g/crs/bin/ocrcheck PROT-601: Failed to initialize ocrcheck ---------替换原OCR磁盘,出错: # /oracle/product/10g/crs/bin/ocrconfig -replace ocr '/dev/rdsk/c5t600A0B80005AD8BC0000020000000000d0s7' /oracle/product/10g/crs/bin/ocr

oracle数据库ORA-15196: invalid ASM block header [kfc.c:26076] [hard_kfbh]问题

这是某个网友的数据库,11g ASM环境. 其中ASM元数据出现损坏,导致DiskGroup无法mount.不过比较万幸的存储有镜像.即使是这样,据说存储工程师恢复也花了1天多,对于我们的业务系统来讲,这是不可接受的. 我这里将该数据库case的信息贴出来,供大家参考!(备注:我们提供完善的数据库各种解决方案,详情请看:云和恩墨) WARNING: cache read  a corrupt block: group=3(DATAVG) dsk=27 blk=1 disk=27 (DATAVG_

创建ASM实例及ASM数据库

--======================== -- 创建ASM实例及ASM数据库 --========================   一.ASM相关概念     1.什么是ASM(Auto Storage Management)         简称自动存储管理,是一种用于管理磁盘的工具         能够在多个物理设备之间实现条带化.镜像数据文件.恢复文件等         文件按分配单元AUs(allocation units)平衡分布在磁盘组的所有磁盘中,ASM使用索引技术

【ASM】ASM基础知识

[ASM]ASM基础知识 市场占有率 ASM自动存储管理技术已经面世10多个年头,目前已经广泛使用于各个领域的数据库存储解决方案. 到2014年为止,ASM在RAC上的采用率接近60%,在单机环境中也超过了25%. RAC集群环境中3种存储解决方案: ASM.集群文件系统和裸设备: 虽然仍有部分用户坚持使用古老的裸设备,但随着版本的升级,更多用户开始采用ASM这种ORACLE提供的免费解决方案. 在国内使用ASM的场景一般均采用 External Redundancy(11gR2除了存放ocr/