OCR相当于Windows的注册表。对于Windows而言,所有的软件信息,用户,配置,安全等等统统都放到注册表里边。而集群呢,同样如此,所有和集群相关的资源,配置,节点,RAC数据库统统都放在这个仓库里。如果OCR被破坏则导致集群服务启动异常,需要修复OCR。因此OCR的管理与维护对于整个集群而言,是相当重要的。本文主要描述了Oracle 10g RAC下的OCR的管理与维护。
1、环境 oracle@bo2dbp:~> cat /etc/issue Welcome to SUSE Linux Enterprise Server 10 SP3 (x86_64) - Kernel \r (\l). oracle@bo2dbp:~> crsctl query crs activeversion CRS active version on the cluster is [10.2.0.3.0] 2、校验OCR文件 oracle@bo2dbp:~> ocrcheck Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 204560 Used space (kbytes) : 6160 Available space (kbytes) : 198400 ID : 1512159503 Device/File Name : /dev/raw/raw1 <-- OCR (primary) Device/File integrity check succeeded Device/File not configured <-- OCR Mirror (not configured) Cluster registry integrity check succeeded #如果clusterware处于关闭状态也可通过查询ocr.loc获得ocr文件所在的位置 oracle@bo2dbp:~> more /etc/oracle/ocr.loc ocrconfig_loc=/dev/raw/raw1 local_only=FALSE #校验OCR产生的日志文件 $ORA_CRS_HOME/log/<hostname>/client/ocrcheck_<pid>.log 3、dump OCR的内容 #缺省情况下,ocrdump命令导出文件被命名为OCRDUMPFILE,其文件类型为ASCII文件,如果缺省文件已存在,则收到PROT-303文件存在提示 #导出ocr到缺省文件 oracle@bo2dbp:~> ocrdump oracle@bo2dbp:~> ls -hltr OCRDUMPFILE -rw-r--r-- 1 oracle oinstall 44K 2013-01-07 14:13 OCRDUMPFILE oracle@bo2dbp:~> file OCRDUMPFILE OCRDUMPFILE: ASCII text #导出ocr到指定文件 oracle@bo2dbp:~> ocrdump /tmp/`hostname`_ocrdump_`date +%Y%m%d:%H%M` oracle@bo2dbp:~> ls /tmp/*ocr* /tmp/bo2dbp_ocrdump_20130107:1415 #导出ocr中system.css项的内容 oracle@bo2dbp:~> ocrdump -stdout -keyname SYSTEM.css -xml >ocrdump.xml oracle@bo2dbp:~> more ocrdump.xml <OCRDUMP> <TIMESTAMP>01/07/2013 14:15:42</TIMESTAMP> <COMMAND>/u01/oracle/crs/bin/ocrdump.bin -stdout -keyname SYSTEM.css -xml </COMMAND> <KEY> <NAME>SYSTEM.css</NAME> <VALUE_TYPE>UNDEF</VALUE_TYPE> <VALUE><![CDATA[]]></VALUE> <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>root</USER_NAME> <GROUP_NAME>root</GROUP_NAME> ............ #导出ocr中关于ocr的备份情况 oracle@bo2dbp:~> ocrdump -stdout -keyname SYSTEM.OCR -xml>ocrdump_bak.xml 4、添加OCR文件 #注,下面的描述中crs在所有的节点处于online状态,即对于ocr的添加,移动,代替crs无需处于offline状态 oracle@bo2dbp:~> crsctl check crs CSS appears healthy CRS appears healthy EVM appears healthy oracle@bo2dbp:~> ssh bo2dbs crsctl check crs CSS appears healthy CRS appears healthy EVM appears healthy ocrconfig -replace ocrmirror <destination_file> ocrconfig -replace ocrmirror <disk> oracle@bo2dbp:~> sudo -s rcraw status root'''s password: /dev/raw/raw1: bound to major 8, minor 33 /dev/raw/raw2: bound to major 8, minor 49 /dev/raw/raw11: bound to major 8, minor 113 /dev/raw/raw21: bound to major 8, minor 129 /dev/raw/raw22: bound to major 8, minor 145 running oracle@bo2dbp:~> sudo -s /u01/oracle/crs/bin/ocrconfig -replace ocrmirror /dev/raw/raw11 root'''s password: oracle@bo2dbp:~> ocrcheck Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 204560 Used space (kbytes) : 6184 Available space (kbytes) : 198376 ID : 1512159503 Device/File Name : /dev/raw/raw1 Device/File integrity check succeeded Device/File Name : /dev/raw/raw11 #新的ocr镜像已经被添加 Device/File integrity check succeeded Cluster registry integrity check succeeded #下面是连接到第二个节点查看 oracle@bo2dbp:~> ssh bo2dbp ocrcheck Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 204560 Used space (kbytes) : 6184 Available space (kbytes) : 198376 ID : 1512159503 Device/File Name : /dev/raw/raw1 Device/File integrity check succeeded Device/File Name : /dev/raw/raw11 #新的ocr镜像已经被添加 Device/File integrity check succeeded Cluster registry integrity check succeeded #从两个节点查看ocr.loc文件记录的ocr位置是否发生变化 oracle@bo2dbp:~> more /etc/oracle/ocr.loc #Device/file getting replaced by device /dev/raw/raw11 ocrconfig_loc=/dev/raw/raw1 ocrmirrorconfig_loc=/dev/raw/raw11 #可以看到增加了ocrmirror位置 local_only=false oracle@bo2dbp:~> ssh bo2dbs cat /etc/oracle/ocr.loc #Device/file getting replaced by device /dev/raw/raw11 ocrconfig_loc=/dev/raw/raw1 ocrmirrorconfig_loc=/dev/raw/raw11 #下面来尝试多添加一个ocrmirror,我们使用裸设备为raw21的做为镜像 oracle@bo2dbp:~> sudo -s /u01/oracle/crs/bin/ocrconfig -replace ocrmirror /dev/raw/raw21 root'''s password: oracle@bo2dbp:~> ocrcheck Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 204560 Used space (kbytes) : 6184 Available space (kbytes) : 198376 ID : 1512159503 Device/File Name : /dev/raw/raw1 Device/File integrity check succeeded Device/File Name : /dev/raw/raw21 #可以看到原来的raw11已经被替代了 Device/File integrity check succeeded Cluster registry integrity check succeeded #从上面的描述可知,ocr磁盘只能有两个,一个作为primary,一个作为mirror 5、重定位OCR 重定位OCR,也称之为移动OCR文件,也就是将当前的OCR或者镜像的OCR放置到新的裸设备后者OCFS上。对于整个操作可以在联机的情形下完成 不论是移动primary ocr还是mirror ocr,其依赖的ocr必须存在。也就是说必须要有两份ocr存在,否则收到PROT-16: Internal Error 使用下面的命令移动primary ocr ocrconfig -replace ocr <destination_file> ocrconfig -replace ocr <disk> 现在我们将primary ocr移动到之前的raw11上 oracle@bo2dbp:~> dd if=/dev/zero of=/dev/raw/raw11 bs=1024k count=210 dd: writing `/dev/raw/raw11': No space left on device 200+0 records in 199+0 records out 209698816 bytes (210 MB) copied, 5.39183 seconds, 38.9 MB/s oracle@bo2dbp:~> sudo -s /u01/oracle/crs/bin/ocrconfig -replace ocr /dev/raw/raw11 oracle@bo2dbp:~> ocrcheck Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 204560 Used space (kbytes) : 6184 Available space (kbytes) : 198376 ID : 1512159503 Device/File Name : /dev/raw/raw11 #可以看到,此处之前的raw1被新的raw11代替 Device/File integrity check succeeded Device/File Name : /dev/raw/raw21 Device/File integrity check succeeded Cluster registry integrity check succeeded 使用下面的命令移动mirror ocr,关于mirro ocr的移动此处不做演示 ocrconfig -replace ocrmirror <destination_file> ocrconfig -replace ocrmirror <disk> 6、本地节点OCR的修复 对于任一节点的clusterware处于关闭状态或者节点主机被关闭的情形,其他节点对ocr配置的更改将使得处于被停机或关闭的clusterware 节点ocr信息与出现不一致的情形。比如在第二个节点处于关闭的情形下,对第一个节点进行了ocr的添加,移出,重定位等等操作,对于这 种情形则需要在关闭的节点进行ocr修复处理。修复仅仅在clusterware守护进程处于关闭下完成。 使用下面的命令修复ocr,repair参数仅仅作用于所在的节点 ocrconfig -repair ocr device_name #修复primary ocr, ocrconfig -repair ocrmirror device_name #修复mirror ocr 在前面的一个示例中我们将primary 的ocr用原来的raw1变成了raw11,在这个操作期间,节点2已经意外宕机。 下面来查看节点2的ocr.loc 下面的ocr.loc中primary还是raw1,而上一个操作已经变成了raw11 bo2dbs:/u01/oracle/crs/log/bo2dbs # more /etc/oracle/ocr.loc #Device/file /dev/raw/raw11 getting replaced by device /dev/raw/raw21 ocrconfig_loc=/dev/raw/raw1 ocrmirrorconfig_loc=/dev/raw/raw21 local_only=false bo2dbs:/u01/oracle/crs/bin # ./crsctl start crs Attempting to start CRS stack The CRS stack will be started shortly #此时bo2dbs上集群无法启动 bo2dbs:/u01/oracle/crs/bin # ./crsctl check crs Failure 1 contacting CSS daemon Cannot communicate with CRS Cannot communicate with EVM bo2dbs:/u01/oracle/crs/bin # ps -ef | grep d.bin | grep -v grep #没有看到任何集群相关的进程 bo2dbs:/u01/oracle/crs/bin # tail -2 /u01/oracle/crs/log/bo2dbs/alertbo2dbs.log #查看日志文件 2013-01-07 17:13:49.153 [client(12071)]CRS-1009:The OCR configuration is invalid. Details in /u01/oracle/crs/log/bo2dbs/client/css37.log. bo2dbs:/u01/oracle/crs/bin # more /u01/oracle/crs/log/bo2dbs/client/css37.log #查看日志文件 Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved. 2013-01-07 17:13:49.153: [ OCRRAW][190773584]proprioini: OCR configuration on disk 1 isn't valid 2013-01-07 17:13:49.153: [ OCRRAW][190773584]proprinit: Could not open raw device 2013-01-07 17:13:49.153: [ default][190773584]a_init:7!: Backend init unsuccessful : [26] 2013-01-07 17:13:49.153: [ CSSCLNT][190773584]clsssinit: Unable to access OCR device in OCR init.PROC-26: Error while accessing the physical storage #从上面的日志可知,在磁盘1上ocr配置无效,也就是不能打开裸设备raw1 #下面我们来尝试修复 bo2dbs:/u01/oracle/crs/bin # ./ocrconfig -repair ocr /dev/raw/raw11 bo2dbs:/u01/oracle/crs/bin # more /etc/oracle/ocr.loc #Device/file /dev/raw/raw1 getting replaced by device /dev/raw/raw11 ocrconfig_loc=/dev/raw/raw11 ocrmirrorconfig_loc=/dev/raw/raw21 local_only=false bo2dbs:/u01/oracle/crs/bin # ./crsctl start crs Attempting to start CRS stack The CRS stack will be started shortly bo2dbs:/u01/oracle/crs/bin # ./crsctl check crs CSS appears healthy CRS appears healthy EVM appears healthy bo2dbs:/u01/oracle/crs/bin # ps -ef | grep d.bin | grep -v grep root 14459 5067 0 17:33 ? 00:00:01 /u01/oracle/crs/bin/crsd.bin reboot oracle 14512 5065 0 17:33 ? 00:00:00 /u01/oracle/crs/bin/evmd.bin oracle 15128 14426 0 17:33 ? 00:00:01 /u01/oracle/crs/bin/ocssd.bin bo2dbs:/u01/oracle/crs/bin # ./crs_stat -t | grep bo2dbs ora....SM2.asm application ONLINE ONLINE bo2dbs ora....BS.lsnr application ONLINE ONLINE bo2dbs ora....BS.lsnr application ONLINE ONLINE bo2dbs ora.bo2dbs.gsd application ONLINE ONLINE bo2dbs ora.bo2dbs.ons application ONLINE ONLINE bo2dbs ora.bo2dbs.vip application ONLINE ONLINE bo2dbs ora....g2.inst application ONLINE ONLINE bo2dbs # Authro : Robinson Cheng # Blog : http://blog.csdn.net/robinson_0612 7、移除OCR #OCR可以添加,当然也可以移除OCR,比如对于ocr的镜像使用外部raid冗余方式。不过有个条件是必须至少有一个OCR处于联机状态。 通常可以按照下面的步骤来移除OCR 校验集群处于联机状态(尽可能为所有节点) 检查至少一个ocr处于联机状态 移除pirmary ocr或者ocr mirror 对于ocfs文件系统,移除ocr文件 #使用下面的命令来移除ocr ocrconfig -replace ocr ocrconfig -replace ocrmirror bo2dbs:/u01/oracle/crs/bin # ps -ef | grep d.bin oracle 5745 5092 0 17:52 ? 00:00:00 /u01/oracle/crs/bin/evmd.bin root 5902 5094 0 17:52 ? 00:00:02 /u01/oracle/crs/bin/crsd.bin reboot oracle 6420 5795 0 17:52 ? 00:00:01 /u01/oracle/crs/bin/ocssd.bin root 8870 18345 0 18:01 pts/0 00:00:00 grep d.bin bo2dbs:/u01/oracle/crs/bin # ssh bo2dbp ps -ef | grep d.bin The authenticity of host 'bo2dbp (192.168.7.51)' can't be established. RSA key fingerprint is 2a:77:4f:eb:46:5b:07:4a:12:23:5c:69:b2:cd:15:ec. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'bo2dbp' (RSA) to the list of known hosts. Password: oracle 5837 5122 0 17:54 ? 00:00:00 /u01/oracle/crs/bin/evmd.bin root 5907 5124 0 17:54 ? 00:00:02 /u01/oracle/crs/bin/crsd.bin reboot oracle 6672 5756 0 17:54 ? 00:00:01 /u01/oracle/crs/bin/ocssd.bin #两个ocr都处于online bo2dbs:/u01/oracle/crs/bin # ./ocrcheck Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 204560 Used space (kbytes) : 6184 Available space (kbytes) : 198376 ID : 1512159503 Device/File Name : /dev/raw/raw11 #primary ocr Device/File integrity check succeeded Device/File Name : /dev/raw/raw21 #mirror ocr Device/File integrity check succeeded Cluster registry integrity check succeeded bo2dbs:/u01/oracle/crs/bin # ./ocrconfig -replace ocr #移除ocr bo2dbs:/u01/oracle/crs/bin # more /etc/oracle/ocr.loc #Device/file /dev/raw/raw11 being deleted ocrconfig_loc=/dev/raw/raw21 local_only=false bo2dbs:/u01/oracle/crs/bin # ocrcheck -bash: ocrcheck: command not found bo2dbs:/u01/oracle/crs/bin # ./ocrcheck Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 204560 Used space (kbytes) : 6184 Available space (kbytes) : 198376 ID : 1512159503 Device/File Name : /dev/raw/raw21 #可以看到原来的mirror ocr变成了primary ocr Device/File integrity check succeeded Device/File not configured Cluster registry integrity check succeeded
OCR 相关参考:
Oracle RAC OCR 与健忘症 Oracle RAC OCR 的备份与恢复
更多参考:
有关Oracle RAC请参考
使用crs_setperm修改RAC资源的所有者及权限 使用crs_profile管理RAC资源配置文件 RAC 数据库的启动与关闭 再说 Oracle RAC services Services in Oracle Database 10g Migrate datbase from single instance to Oracle RAC Oracle RAC 连接到指定实例 Oracle RAC 负载均衡测试(结合服务器端与客户端) Oracle RAC 服务器端连接负载均衡(Load Balance) Oracle RAC 客户端连接负载均衡(Load Balance) ORACLE RAC 下非缺省端口监听配置(listener.ora tnsnames.ora)
ORACLE RAC 监听配置 (listener.ora tnsnames.ora) 配置 RAC 负载均衡与故障转移 CRS-1006 , CRS-0215 故障一例
基于Linux (RHEL 5.5) 安装Oracle 10g RAC
使用 runcluvfy 校验Oracle RAC安装环境
有关Oracle 网络配置相关基础以及概念性的问题请参考:
配置非默认端口的动态服务注册
配置sqlnet.ora限制IP访问Oracle Oracle 监听器日志配置与管理
设置 Oracle 监听器密码(LISTENER) 配置ORACLE 客户端连接到数据库
有关基于用户管理的备份和备份恢复的概念请参考
Oracle 冷备份 Oracle 热备份 Oracle 备份恢复概念 Oracle 实例恢复 Oracle 基于用户管理恢复的处理 SYSTEM 表空间管理及备份恢复 SYSAUX表空间管理及恢复 Oracle 基于备份控制文件的恢复(unsing backup controlfile)
有关RMAN的备份恢复与管理请参考
RMAN 概述及其体系结构 RMAN 配置、监控与管理 RMAN 备份详解 RMAN 还原与恢复 RMAN catalog 的创建和使用 基于catalog 创建RMAN存储脚本 基于catalog 的RMAN 备份与恢复 RMAN 备份路径困惑 使用RMAN实现异机备份恢复(WIN平台) 使用RMAN迁移文件系统数据库到ASM linux 下RMAN备份shell脚本 使用RMAN迁移数据库到异机
有关ORACLE体系结构请参考
Oracle 表空间与数据文件 Oracle 密码文件 Oracle 参数文件 Oracle 联机重做日志文件(ONLINE LOG FILE) Oracle 控制文件(CONTROLFILE) Oracle 归档日志 Oracle 回滚(ROLLBACK)和撤销(UNDO) Oracle 数据库实例启动关闭过程 Oracle 10g SGA 的自动化管理 Oracle 实例和Oracle数据库(Oracle体系结构)