【RAC】How to Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]

Applies to:

Oracle Server – Enterprise Edition – Version: 11.2.0.1 and later   [Release: 11.2 and later ]

Information in this document applies to any platform.

Goal

This goal of the note is to provide reference to troubleshoot 11gR2 Grid Infrastructure clusterware startup issues. It applies to issues in both new environments (during root.sh or rootupgrade.sh) and unhealthy existing environments.  To look specifically at root.sh issues, see Note: 1053970.1 for more information.

Solution

Start up sequence:

In a nutshell, the operating system starts ohasd, ohasd starts agents to start up daemons (gipcd, mdnsd, gpnpd, ctssd, ocssd, crsd, evmd asm etc), and crsd starts agents that start user resources (database, SCAN, listener etc).

For detailed Grid Infrastructure clusterware startup sequence, please refer to note 1053147.1

Cluster status

To find out cluster and daemon status:

$GRID_HOME/bin/crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

$GRID_HOME/bin/crsctl stat res -t -init

——————————————————————————–

NAME      TARGET  STATE     SERVER          STATE_DETAILS

——————————————————————————–

Cluster Resources

——————————————————————————–

ora.asm

1        ONLINE  ONLINE       rac1              Started

ora.crsd

1        ONLINE  ONLINE       rac1

ora.cssd

1        ONLINE  ONLINE       rac1

ora.cssdmonitor

1        ONLINE  ONLINE       rac1

ora.ctssd

1        ONLINE  ONLINE       rac1                  OBSERVER

ora.diskmon

1        ONLINE  ONLINE       rac1

ora.drivers.acfs

1        ONLINE  ONLINE       rac1

ora.evmd

1        ONLINE  ONLINE       rac1

ora.gipcd

1        ONLINE  ONLINE       rac1

ora.gpnpd

1        ONLINE  ONLINE       rac1

ora.mdnsd

1        ONLINE  ONLINE       rac1

For 11.2.0.2 and above, there will be two more processes:

ora.cluster_interconnect.haip

1        ONLINE  ONLINE       

rac1ora.crf  1        ONLINE  ONLINE       rac1

To start an offline daemon – if ora.crsd is OFFLINE:

$GRID_HOME/bin/crsctl start res ora.crsd -init

Case 1: OHASD.BIN does not start

As ohasd.bin is responsible to start up all other cluserware processes directly or indirectly, it needs to start up properly for the rest of the stack to come up.

Automatic ohasd.bin start up depends on the following:

1. OS is at appropriate run level:

OS need to be at specified run level before CRS will try to start up.

To find out at which run level the clusterware needs to come up:

cat /etc/inittab|grep init.ohasd

h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1

Above example shows CRS suppose to run at run level 3 and 5; please note depend on platform, CRS comes up at different run level.

To find out current run level:

who -r

2. “init.ohasd run” is up

On Linux/UNIX, as “init.ohasd run” is configured in /etc/inittab, process init (pid 1, /sbin/init on Linux, Solaris and hp-ux, /usr/sbin/init on AIX) will start and respawn “init.ohasd run” if it fails. Without “init.ohasd run” up and running, ohasd.bin will not start:

ps -ef|grep init.ohasd|grep -v grep

root      2279     1  0 18:14 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run

If any rc Snncommand script. (located in rcn.d, example S98gcstartup) stuck, init process may not start “/etc/init.d/init.ohasd run”; please engage OS vendor to find out why relevant Snncommand script. stuck.

3. Cluserware auto start is enabled – its enabled by default

By default CRS is enabled for auto start upon node reboot, to enable:

$GRID_HOME/bin/crsctl enable crs

To verify whether its currently enabled or not:

cat $SCRBASE/$HOSTNAME/root/ohasdstr

enable

SCRBASE is /etc/oracle/scls_scr on Linux and AIX, /var/opt/oracle/scls_scr on hp-ux and Solaris

Note: NEVER EDIT THE FILE MANUALLY, use “crsctl enable/disable crs” command instead.

4. syslogd is up and OS is able to execute init script. S96ohasd

OS may stuck with some other Snn script. while node is coming up, thus never get chance to execute S96ohasd; if that’s the case, following message will not be in OS messages:

Jan 20 20:46:51 rac1 logger: Oracle HA daemon is enabled for autostart.

If you don’t see above message, the other possibility is syslogd(/usr/sbin/syslogd) is not fully up. Grid may fail to come up in that case as well. This may not apply to AIX.

To find out whether OS is able to execute S96ohasd while node is coming up, modify ohasd:

From:

case `$CAT $AUTOSTARTFILE` in

enable*)

$LOGERR "Oracle HA daemon is enabled for autostart."

To:

case `$CAT $AUTOSTARTFILE` in

enable*)

/bin/touch /tmp/ohasd.start."`date`"

$LOGERR "Oracle HA daemon is enabled for autostart."

After a node reboot, if you don’t see /tmp/ohasd.start.timestamp get created, it means OS stuck with some other Snn script. If you do see /tmp/ohasd.start.timestamp but not “Oracle HA daemon is enabled for autostart” in messages, likely syslogd is not fully up. For both case, you will need engage System Administrator to find out the issue on OS level. For latter case, the workaround is to “sleep” for about 2 minutes, modify ohasd:

From:

case `$CAT $AUTOSTARTFILE` in

enable*)

$LOGERR "Oracle HA daemon is enabled for autostart."

To:

case `$CAT $AUTOSTARTFILE` in

enable*)

/bin/sleep 120

$LOGERR "Oracle HA daemon is enabled for autostart."

5. File System that GRID_HOME resides is online when init script. S96ohasd is executed; once S96ohasd is executed, following message should be in OS messages file:

Jan 20 20:46:51 rac1 logger: Oracle HA daemon is enabled for autostart.

..

Jan 20 20:46:57 rac1 logger: exec /ocw/grid/perl/bin/perl -I/ocw/grid/perl/lib /ocw/grid/bin/crswrapexece.pl /ocw/grid/crs/install/s_crsconfig_rac1_env.txt /ocw/grid/bin/ohasd.bin “reboot”

If you see the first line, but not the last line, likely the filesystem containing the GRID_HOME was not online while S96ohasd is executed.

6. Oracle Local Registry (OLR, $GRID_HOME/cdata/${HOSTNAME}.olr) is accessible

ls -l $GRID_HOME/cdata/*.olr

-rw——- 1 root  oinstall 272756736 Feb  2 18:20 rac1.olr

If the OLR is inaccessible or corrupted, likely ohasd.log will have similar messages like following:

..

2010-01-24 22:59:10.470: [ default][1373676464] Initializing OLR

2010-01-24 22:59:10.472: [  OCROSD][1373676464]utopen:6m’:failed in stat OCR file/disk /ocw/grid/cdata/rac1.olr, errno=2, os err string=No such file or directory

2010-01-24 22:59:10.472: [  OCROSD][1373676464]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory

2010-01-24 22:59:10.473: [  OCRRAW][1373676464]proprinit: Could not open raw device

2010-01-24 22:59:10.473: [  OCRAPI][1373676464]a_init:16!: Backend init unsuccessful : [26]

2010-01-24 22:59:10.473: [  CRSOCR][1373676464] OCR context init failure.  Error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]

2010-01-24 22:59:10.473: [ default][1373676464] OLR initalization failured, rc=26

2010-01-24 22:59:10.474: [ default][1373676464]Created alert : (:OHAS00106:) :  Failed to initialize Oracle Local Registry

2010-01-24 22:59:10.474: [ default][1373676464][PANIC] OHASD exiting; Could not init OLR

OR

..

2010-01-24 23:01:46.275: [  OCROSD][1228334000]utread:3: Problem reading buffer 1907f000 buflen 4096 retval 0 phy_offset 102400 retry 5

2010-01-24 23:01:46.275: [  OCRRAW][1228334000]propriogid:1_1: Failed to read the whole bootblock. Assumes invalid format.

2010-01-24 23:01:46.275: [  OCRRAW][1228334000]proprioini: all disks are not OCR/OLR formatted

2010-01-24 23:01:46.275: [  OCRRAW][1228334000]proprinit: Could not open raw device

2010-01-24 23:01:46.275: [  OCRAPI][1228334000]a_init:16!: Backend init unsuccessful : [26]

2010-01-24 23:01:46.276: [  CRSOCR][1228334000] OCR context init failure.  Error: PROCL-26: Error while accessing the physical storage

2010-01-24 23:01:46.276: [ default][1228334000] OLR initalization failured, rc=26

2010-01-24 23:01:46.276: [ default][1228334000]Created alert : (:OHAS00106:) :  Failed to initialize Oracle Local Registry

2010-01-24 23:01:46.277: [ default][1228334000][PANIC] OHASD exiting; Could not init OLR

OR

..

2010-11-07 03:00:08.932: [ default][1] Created alert : (:OHAS00102:) : OHASD is not running as privileged user

2010-11-07 03:00:08.932: [ default][1][PANIC] OHASD exiting: must be run as privileged user

OR

..

2010-08-04 13:13:11.102: [   CRSPE][35] Resources parsed

2010-08-04 13:13:11.103: [   CRSPE][35] Server [] has been registered with the PE data model

2010-08-04 13:13:11.103: [   CRSPE][35] STARTUPCMD_REQ = false:

2010-08-04 13:13:11.103: [   CRSPE][35] Server [] has changed state from [Invalid/unitialized] to [VISIBLE]

2010-08-04 13:13:11.103: [  CRSOCR][31] Multi Write Batch processing…

2010-08-04 13:13:11.103: [ default][35] Dump State Starting …

..

2010-08-04 13:13:11.112: [   CRSPE][35] SERVERS:

:VISIBLE:address{{Absolute|Node:0|Process:-1|Type:1}}; recovered state:VISIBLE. Assigned to no pool

————- SERVER POOLS:

Free [min:0][max:-1][importance:0] NO SERVERS ASSIGNED

2010-08-04 13:13:11.113: [   CRSPE][35] Dumping ICE contents…:ICE operation count: 0

2010-08-04 13:13:11.113: [ default][35] Dump State Done.

The solution is to restore a good backup of OLR with “ocrconfig -local -restore ”. By default, OLR will be backed up to $GRID_HOME/cdata/$HOST/backup_$TIME_STAMP.olr once installation is complete.

7. ohasd.bin is able to access network socket files:

2010-06-29 10:31:01.570: [ COMMCRS][1206901056]clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=procr_local_conn_0_PROL))

2010-06-29 10:31:01.571: [  OCRSRV][1217390912]th_listen: CLSCLISTEN failed clsc_ret= 3, addr= [(ADDRESS=(PROTOCOL=ipc)(KEY=procr_local_conn_0_PROL))]

2010-06-29 10:31:01.571: [  OCRSRV][3267002960]th_init: Local listener did not reach valid state

In Grid Infrastructure cluster environment, ohasd related socket files should be owned by root, but in Oracle Restart environment, they should be owned by grid user, refer to “Network Socket File Location, Ownership and Permission” section for example output.

8. ohasd.bin is able to access log file location:

CRS-4124: Oracle High Availability Services startup failed.

CRS-4000: Command Start failed, or completed with errors.

OS messages/syslog shows:

Feb 20 10:47:08 racnode1 OHASD[9566]: OHASD exiting; Directory /ocw/grid/log/racnode1/ohasd not found.

Refer to “Log File Location, Ownership and Permission ” section for example output, if the expected directory is missing, create it with proper ownership and permission.

Case 2: OHASD Agents does not start

OHASD.BIN will spawn four agents/monitors to start level resource:

oraagent: responsible for ora.asm, ora.evmd, ora.gipcd, ora.gpnpd, ora.mdnsd etc

orarootagent: responsible for ora.crsd, ora.ctssd, ora.diskmon, ora.drivers.acfs etc

cssdagent / cssdmonitor: responsible for ora.cssd(for ocssd.bin) and ora.cssdmonitor(for cssdmonitor itself)

If ohasd.bin can not start any of above agents properly, clusterware will not come to healthy state; common causes of agent failure are that the log file or log directory for the agents don’t have proper ownership or permission.

Refer to below section “Log File Location, Ownership and Permission” for general reference.

Case 3: CSSD.BIN does not start

Successful cssd.bin startup depends on the following:

1. GPnP profile is accessible – gpnpd needs to be fully up to serve profile

If ocssd.bin is able to get the profile successfully, likely ocssd.log will have similar messages like following:

2010-02-02 18:00:16.251: [    GPnP][408926240]clsgpnpm_exchange: [at clsgpnpm.c:1175] Calling “ipc://GPNPD_rac1″, try 4 of 500…

2010-02-02 18:00:16.263: [    GPnP][408926240]clsgpnp_profileVerifyForCall: [at clsgpnp.c:1867] Result: (87) CLSGPNP_SIG_VALPEER. Profile verified.  prf=0x165160d0

2010-02-02 18:00:16.263: [    GPnP][408926240]clsgpnp_profileGetSequenceRef: [at clsgpnp.c:841] Result: (0) CLSGPNP_OK. seq of p=0x165160d0 is ’6′=6

2010-02-02 18:00:16.263: [    GPnP][408926240]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2186] Result: (0) CLSGPNP_OK. Successful get-profile CALL to remote “ipc://GPNPD_rac1″ disco “”

Otherwise messages like following will show in ocssd.log

2010-02-03 22:26:17.057: [    GPnP][3852126240]clsgpnpm_connect: [at clsgpnpm.c:1100] GIPC gipcretConnectionRefused (29) gipcConnect(ipc-ipc://GPNPD_rac1)

2010-02-03 22:26:17.057: [    GPnP][3852126240]clsgpnpm_connect: [at clsgpnpm.c:1101] Result: (48) CLSGPNP_COMM_ERR. Failed to connect to call url “ipc://GPNPD_rac1″

2010-02-03 22:26:17.057: [    GPnP][3852126240]clsgpnp_getProfileEx: [at clsgpnp.c:546] Result: (13) CLSGPNP_NO_DAEMON. Can’t get GPnP service profile from local GPnP daemon

2010-02-03 22:26:17.057: [ default][3852126240]Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).

2010-02-03 22:26:17.057: [    CSSD][3852126240]clsgpnp_getProfile failed, rc(13)

2. Voting Disk is accessible

In 11gR2, ocssd.bin discover voting disk with setting from GPnP profile, if not enough voting disks can be identified, ocssd.bin will abort itself.

2010-02-03 22:37:22.212: [    CSSD][2330355744]clssnmReadDiscoveryProfile: voting file discovery string(/share/storage/di*)

..

2010-02-03 22:37:22.227: [    CSSD][1145538880]clssnmvDiskVerify: Successful discovery of 0 disks

2010-02-03 22:37:22.227: [    CSSD][1145538880]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery

2010-02-03 22:37:22.227: [    CSSD][1145538880]clssnmvFindInitialConfigs: No voting files found

2010-02-03 22:37:22.228: [    CSSD][1145538880]###################################

2010-02-03 22:37:22.228: [    CSSD][1145538880]clssscExit: CSSD signal 11 in thread clssnmvDDiscThread

ocssd.bin may not come up with the following error if all nodes failed while there’s a voting file change in progress:

2010-05-02 03:11:19.033: [    CSSD][1197668093]clssnmCompleteInitVFDiscovery: Detected voting file add in progress for CIN 0:1134513465:0, waiting for configuration to complete 0:1134513098:0

The solution is to start ocssd.bin in exclusive mode with note 1068835.1

If the voting disk is located on a non-ASM device, ownership and permissions should be:

-rw-r—– 1 ogrid oinstall 21004288 Feb  4 09:13 votedisk1

3. Network is functional and name resolution is working:

If ocssd.bin can’t bind to any network, likely the ocssd.log will have messages like following:

2010-02-03 23:26:25.804: [GIPCXCPT][1206540320]gipcmodGipcPassInitializeNetwork: failed to find any interfaces in clsinet, ret gipcretFail (1)

2010-02-03 23:26:25.804: [GIPCGMOD][1206540320]gipcmodGipcPassInitializeNetwork: EXCEPTION[ ret gipcretFail (1) ]  failed to determine host from clsinet, using default

..

2010-02-03 23:26:25.810: [    CSSD][1206540320]clsssclsnrsetup: gipcEndpoint failed, rc 39

2010-02-03 23:26:25.811: [    CSSD][1206540320]clssnmOpenGIPCEndp: failed to listen on gipc addr gipc://rac1:nm_eotcs- ret 39

2010-02-03 23:26:25.811: [    CSSD][1206540320]clssscmain: failed to open gipc endp

If there’s connectivity issue on private network (including multicast is off), likely the ocssd.log will have messages like following:

2010-09-20 11:52:54.014: [    CSSD][1103055168]clssnmvDHBValidateNCopy: node 1, racnode1, has a disk HB, but no network HB, DHB has rcfg 180441784, wrtcnt, 453, LATS 328297844, lastSeqNo 452, uniqueness 1284979488, timestamp 1284979973/329344894

2010-09-20 11:52:54.016: [    CSSD][1078421824]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0

..  >>>> after a long delay

2010-09-20 12:02:39.578: [    CSSD][1103055168]clssnmvDHBValidateNCopy: node 1, racnode1, has a disk HB, but no network HB, DHB has rcfg 180441784, wrtcnt, 1037, LATS 328883434, lastSeqNo 1036, uniqueness 1284979488, timestamp 1284980558/329930254

2010-09-20 12:02:39.895: [    CSSD][1107286336]clssgmExecuteClientRequest: MAINT recvd from proc 2 (0xe1ad870)

2010-09-20 12:02:39.895: [    CSSD][1107286336]clssgmShutDown: Received abortive shutdown request from client.

2010-09-20 12:02:39.895: [    CSSD][1107286336]###################################

2010-09-20 12:02:39.895: [    CSSD][1107286336]clssscExit: CSSD aborting from thread GMClientListener

2010-09-20 12:02:39.895: [    CSSD][1107286336]###################################

To validate network, please refer to note 1054902.1

4. Vendor clusterware is up (if using vendor clusterware)

Grid Infrastructure provide full clusterware functionality and doesn’t need Vendor clusterware to be installed; but if you happened to have Grid Infrastructure on top of Vendor clusterware in your environment, then Vendor clusterware need to come up fully before CRS can be started, to verify, as grid user:

$GRID_HOME/bin/lsnodes -n

racnode1    1

racnode1    0

If vendor clusterware is not fully up, likely ocssd.log will have similar messages like following:

2010-08-30 18:28:13.207: [    CSSD][36]clssnm_skgxninit: skgxncin failed, will retry

2010-08-30 18:28:14.207: [    CSSD][36]clssnm_skgxnmon: skgxn init failed

2010-08-30 18:28:14.208: [    CSSD][36]###################################

2010-08-30 18:28:14.208: [    CSSD][36]clssscExit: CSSD signal 11 in thread skgxnmon

Before the clusterware is installed, execute the command below as grid user:

$INSTALL_SOURCE/install/lsnodes -v

 

Case 4: CRSD.BIN does not start

Successful crsd.bin startup depends on the following:

1. ocssd is fully up

If ocssd.bin is not fully up, crsd.log will show messages like following:

2010-02-03 22:37:51.638: [ CSSCLNT][1548456880]clssscConnect: gipc request failed with 29 (0×16)

2010-02-03 22:37:51.638: [ CSSCLNT][1548456880]clsssInitNative: connect failed, rc 29

2010-02-03 22:37:51.639: [  CRSRTI][1548456880] CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2. OCR is accessible

If the OCR is located on ASM and it’s unavailable, likely the crsd.log will show messages like:

2010-02-03 22:22:55.186: [  OCRASM][2603807664]proprasmo: Error in open/create file in dg [GI]

[  OCRASM][2603807664]SLOS : SLOS: cat=7, pn=kgfoAl06, dep=15077, loc=kgfokge

ORA-15077: could not locate ASM instance serving a required diskgroup

2010-02-03 22:22:55.189: [  OCRASM][2603807664]proprasmo: kgfoCheckMount returned [7]

2010-02-03 22:22:55.189: [  OCRASM][2603807664]proprasmo: The ASM instance is down

2010-02-03 22:22:55.190: [  OCRRAW][2603807664]proprioo: Failed to open [+GI]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.

2010-02-03 22:22:55.190: [  OCRRAW][2603807664]proprioo: No OCR/OLR devices are usable

2010-02-03 22:22:55.190: [  OCRASM][2603807664]proprasmcl: asmhandle is NULL

2010-02-03 22:22:55.190: [  OCRRAW][2603807664]proprinit: Could not open raw device

2010-02-03 22:22:55.190: [  OCRASM][2603807664]proprasmcl: asmhandle is NULL

2010-02-03 22:22:55.190: [  OCRAPI][2603807664]a_init:16!: Backend init unsuccessful : [26]

2010-02-03 22:22:55.190: [  CRSOCR][2603807664] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, pn=kgfoAl06, dep=15077, loc=kgfokge

ORA-15077: could not locate ASM instance serving a required diskgroup

] [7]

2010-02-03 22:22:55.190: [    CRSD][2603807664][PANIC] CRSD exiting: Could not init OCR, code: 26

Note: in 11.2 ASM starts before crsd.bin, and brings up the diskgroup automatically if it contains the OCR.

If the OCR is located on a non-ASM device, expected ownership and permissions are:

-rw-r—– 1 root  oinstall  272756736 Feb  3 23:24 ocr

If OCR is located on non-ASM device and its unavailable, likely crsd.log will show similar message like following:

2010-02-03 23:14:33.583: [  OCROSD][2346668976]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory

2010-02-03 23:14:33.583: [  OCRRAW][2346668976]proprinit: Could not open raw device

2010-02-03 23:14:33.583: [ default][2346668976]a_init:7!: Backend init unsuccessful : [26]

2010-02-03 23:14:34.587: [  OCROSD][2346668976]utopen:6m’:failed in stat OCR file/disk /share/storage/ocr, errno=2, os err string=No such file or directory

2010-02-03 23:14:34.587: [  OCROSD][2346668976]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory

2010-02-03 23:14:34.587: [  OCRRAW][2346668976]proprinit: Could not open raw device

2010-02-03 23:14:34.587: [ default][2346668976]a_init:7!: Backend init unsuccessful : [26]

2010-02-03 23:14:35.589: [    CRSD][2346668976][PANIC] CRSD exiting: OCR device cannot be initialized, error: 1:26

If the OCR is corrupted, likely crsd.log will show messages like the following:

2010-02-03 23:19:38.417: [ default][3360863152]a_init:7!: Backend init unsuccessful : [26]

2010-02-03 23:19:39.429: [  OCRRAW][3360863152]propriogid:1_2: INVALID FORMAT

2010-02-03 23:19:39.429: [  OCRRAW][3360863152]proprioini: all disks are not OCR/OLR formatted

2010-02-03 23:19:39.429: [  OCRRAW][3360863152]proprinit: Could not open raw device

2010-02-03 23:19:39.429: [ default][3360863152]a_init:7!: Backend init unsuccessful : [26]

2010-02-03 23:19:40.432: [    CRSD][3360863152][PANIC] CRSD exiting: OCR device cannot be initialized, error: 1:26

If owner or group of grid user got changed, even ASM is available, likely crsd.log will show following:

2010-03-10 11:45:12.510: [  OCRASM][611467760]proprasmo: Error in open/create file in dg [SYSTEMDG]

[  OCRASM][611467760]SLOS : SLOS: cat=7, pn=kgfoAl06, dep=1031, loc=kgfokge

ORA-01031: insufficient privileges

2010-03-10 11:45:12.528: [  OCRASM][611467760]proprasmo: kgfoCheckMount returned [7]

2010-03-10 11:45:12.529: [  OCRASM][611467760]proprasmo: The ASM instance is down

2010-03-10 11:45:12.529: [  OCRRAW][611467760]proprioo: Failed to open [+SYSTEMDG]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.

2010-03-10 11:45:12.529: [  OCRRAW][611467760]proprioo: No OCR/OLR devices are usable

2010-03-10 11:45:12.529: [  OCRASM][611467760]proprasmcl: asmhandle is NULL

2010-03-10 11:45:12.529: [  OCRRAW][611467760]proprinit: Could not open raw device

2010-03-10 11:45:12.529: [  OCRASM][611467760]proprasmcl: asmhandle is NULL

2010-03-10 11:45:12.529: [  OCRAPI][611467760]a_init:16!: Backend init unsuccessful : [26]

2010-03-10 11:45:12.530: [  CRSOCR][611467760] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, pn=kgfoAl06, dep=1031, loc=kgfokge

ORA-01031: insufficient privileges

] [7]

If OCR or mirror is unavailable (could be ASM is up, but diskgroup for OCR/mirror is unmounted), likely crsd.log will show following:

2010-05-11 11:16:38.578: [  OCRASM][18]proprasmo: Error in open/create file in dg [OCRMIR]

[  OCRASM][18]SLOS : SLOS: cat=8, pn=kgfoOpenFile01, dep=15056, loc=kgfokge

ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +OCRMIR.255.4294967295

ORA-17503: ksfdopn:2 Failed to open file +OCRMIR.255.4294967295

ORA-15001: diskgroup “OCRMIR

..

2010-05-11 11:16:38.647: [  OCRASM][18]proprasmo: kgfoCheckMount returned [6]

2010-05-11 11:16:38.648: [  OCRASM][18]proprasmo: The ASM disk group OCRMIR is not found or not mounted

2010-05-11 11:16:38.648: [  OCRASM][18]proprasmdvch: Failed to open OCR location [+OCRMIR] error [26]

2010-05-11 11:16:38.648: [  OCRRAW][18]propriodvch: Error  [8] returned device check for [+OCRMIR]

2010-05-11 11:16:38.648: [  OCRRAW][18]dev_replace: non-master could not verify the new disk (8)

[  OCRSRV][18]proath_invalidate_action: Failed to replace [+OCRMIR] [8]

[  OCRAPI][18]procr_ctx_set_invalid_no_abort: ctx set to invalid

..

2010-05-11 11:16:46.587: [  OCRMAS][19]th_master:91: Comparing device hash ids between local and master failed

2010-05-11 11:16:46.587: [  OCRMAS][19]th_master:91 Local dev (1862408427, 1028247821, 0, 0, 0)

2010-05-11 11:16:46.587: [  OCRMAS][19]th_master:91 Master dev (1862408427, 1859478705, 0, 0, 0)

2010-05-11 11:16:46.587: [  OCRMAS][19]th_master:9: Shutdown CacheLocal. my hash ids don’t match

[  OCRAPI][19]procr_ctx_set_invalid_no_abort: ctx set to invalid

[  OCRAPI][19]procr_ctx_set_invalid: aborting…

2010-05-11 11:16:46.587: [    CRSD][19] Dump State Starting …

3. crsd.bin pid file exists

2010-02-14 17:41:57.927: [  clsdmt][1092499776]Creating PID [30269] file for home /ocw/grid host racnode1 bin crs to /ocw/grid/crs/init/

2010-02-14 17:41:57.927: [  clsdmt][1092499776]Error3 -2 writing PID [30269] to the file []

2010-02-14 17:41:57.927: [  clsdmt][1092499776]Failed to record pid for CRSD

2010-02-14 17:41:57.927: [  clsdmt][1092499776]Terminating process

2010-02-14 17:41:57.927: [ default][1092499776] CRSD exiting on stop request from clsdms_thdmai

File $GRID_HOME/crs/init/$HOSTNAME.pid should exist, example:

ls -l /ocw/grid/crs/init/*pid

-rwxr-xr-x 1 ogrid oinstall 5 Feb 17 11:00 /ocw/grid/crs/init/racnode1.pid

If the file does not exist, crsd.bin may not come up, in this case, create a file manually as grid user with “touch” command.

4. Network is functional and name resolution is working:

If the network is not fully functioning, ocssd.bin may still come up, but crsd.bin may fail and the crsd.log will show messages like:

2010-02-03 23:34:28.412: [    GPnP][2235814832]clsgpnp_Init: [at clsgpnp0.c:837] GPnP client pid=867, tl=3, f=0

2010-02-03 23:34:28.428: [  OCRAPI][2235814832]clsu_get_private_ip_addresses: no ip addresses found.

..

2010-02-03 23:34:28.434: [  OCRAPI][2235814832]a_init:13!: Clusterware init unsuccessful : [44]

2010-02-03 23:34:28.434: [  CRSOCR][2235814832] OCR context init failure.  Error: PROC-44: Error in network address and interface operations Network address and interface operations error [7]

2010-02-03 23:34:28.434: [    CRSD][2235814832][PANIC] CRSD exiting: Could not init OCR, code: 44

Or:

2009-12-10 06:28:31.974: [  OCRMAS][20]proath_connect_master:1: could not connect to master  clsc_ret1 = 9, clsc_ret2 = 9

2009-12-10 06:28:31.974: [  OCRMAS][20]th_master:11: Could not connect to the new master

2009-12-10 06:29:01.450: [ CRSMAIN][2] Policy Engine is not initialized yet!

2009-12-10 06:29:31.489: [ CRSMAIN][2] Policy Engine is not initialized yet!

Or:

2009-12-31 00:42:08.110: [ COMMCRS][10]clsc_receive: (102b03250) Error receiving, ns (12535, 12560), transport (505, 145, 0)

To validate the network, please refer to note 1054902.1

Case 5: GPNPD.BIN does not start

1. Name Resolution is not working

gpnpd.bin fails with following error in gpnpd.log:

2010-05-13 12:48:11.540: [    GPnP][1171126592]clsgpnpm_exchange: [at clsgpnpm.c:1175] Calling “tcp://node2:9393″, try 1 of 3…

2010-05-13 12:48:11.540: [    GPnP][1171126592]clsgpnpm_connect: [at clsgpnpm.c:1015] ENTRY

2010-05-13 12:48:11.541: [    GPnP][1171126592]clsgpnpm_connect: [at clsgpnpm.c:1066] GIPC gipcretFail (1) gipcConnect(tcp-tcp://node2:9393)

2010-05-13 12:48:11.541: [    GPnP][1171126592]clsgpnpm_connect: [at clsgpnpm.c:1067] Result: (48) CLSGPNP_COMM_ERR. Failed to connect to call url “tcp://node2:9393″

In above example, please make sure current node is able to ping “node2″, and no firewall between them.

Case 6: Various other daemons does not start

Two common causes:

1. Log file or directory for the daemon doesn’t have appropriate ownership or permission

If the log file or log directory for the daemon doesn’t have proper ownership or permissions, usually there is no new info in the log file and the timestamp remains the same while the daemon tries to come up.

Refer to below section “Log File Location, Ownership and Permission” for general reference.

2. Network socket file doesn’t have appropriate ownership or permission

In this case, the daemon log will show messages like:

2010-02-02 12:55:20.485: [ COMMCRS][1121433920]clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_GIPCD))

2010-02-02 12:55:20.485: [  clsdmt][1110944064]Fail to listen to (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_GIPCD))

Case 7: CRSD Agents does not start

CRSD.BIN will spawn two agents to start up user resource -the two agent share same name and binary as ohasd.bin agents:

orarootagent: responsible for ora.netn.network, ora.nodename.vip, ora.scann.vip and  ora.gns

oraagent: responsible for ora.asm, ora.eons, ora.ons, listener, SCAN listener, diskgroup, database, service resource etc

To find out the user resource status:

$GRID_HOME/crsctl stat res -t

If crsd.bin can not start any of the above agents properly, user resources may not come up.  A common cause of agent failure is that the log file or log directory for the agents don’t have proper ownership or permissions.

Refer to below section “Log File Location, Ownership and Permission” for general reference.

Network and Naming Resolution Verification

CRS depends on a fully functional network and name resolution. If the network or name resolution is not fully functioning, CRS may not come up successfully.

To validate network and name resolution setup, please refer to note 1054902.1

Log File Location, Ownership and Permission

Appropriate ownership and permission of sub-directories and files in $GRID_HOME/log is critical for CRS components to come up properly.

In Grid Infrastructure cluster environment:

Assuming a Grid Infrastructure environment with node name rac1, CRS owner grid, and two separate RDBMS owner rdbmsap and rdbmsar, here’s what it looks like under $GRID_HOME/log in cluster environment:

drwxrwxr-x 5 grid oinstall 4096 Dec  6 09:20 log

drwxr-xr-x  2 grid oinstall 4096 Dec  6 08:36 crs

drwxr-xr-t 17 root   oinstall 4096 Dec  6 09:22 rac1

drwxr-x— 2 grid oinstall  4096 Dec  6 09:20 admin

drwxrwxr-t 4 root   oinstall  4096 Dec  6 09:20 agent

drwxrwxrwt 7 root    oinstall 4096 Jan 26 18:15 crsd

drwxr-xr-t 2 grid  oinstall 4096 Dec  6 09:40 application_grid

drwxr-xr-t 2 grid  oinstall 4096 Jan 26 18:15 oraagent_grid

drwxr-xr-t 2 rdbmsap oinstall 4096 Jan 26 18:15 oraagent_rdbmsap

drwxr-xr-t 2 rdbmsar oinstall 4096 Jan 26 18:15 oraagent_rdbmsar

drwxr-xr-t 2 grid  oinstall 4096 Jan 26 18:15 ora_oc4j_type_grid

drwxr-xr-t 2 root    root     4096 Jan 26 20:09 orarootagent_root

drwxrwxr-t 6 root oinstall 4096 Dec  6 09:24 ohasd

drwxr-xr-t 2 grid oinstall 4096 Jan 26 18:14 oraagent_grid

drwxr-xr-t 2 root   root     4096 Dec  6 09:24 oracssdagent_root

drwxr-xr-t 2 root   root     4096 Dec  6 09:24 oracssdmonitor_root

drwxr-xr-t 2 root   root     4096 Jan 26 18:14 orarootagent_root

-rw-rw-r– 1 root root     12931 Jan 26 21:30 alertrac1.log

drwxr-x— 2 grid oinstall  4096 Jan 26 20:44 client

drwxr-x— 2 root oinstall  4096 Dec  6 09:24 crsd

drwxr-x— 2 grid oinstall  4096 Dec  6 09:24 cssd

drwxr-x— 2 root oinstall  4096 Dec  6 09:24 ctssd

drwxr-x— 2 grid oinstall  4096 Jan 26 18:14 diskmon

drwxr-x— 2 grid oinstall  4096 Dec  6 09:25 evmd

drwxr-x— 2 grid oinstall  4096 Jan 26 21:20 gipcd

drwxr-x— 2 root oinstall  4096 Dec  6 09:20 gnsd

drwxr-x— 2 grid oinstall  4096 Jan 26 20:58 gpnpd

drwxr-x— 2 grid oinstall  4096 Jan 26 21:19 mdnsd

drwxr-x— 2 root oinstall  4096 Jan 26 21:20 ohasd

drwxrwxr-t 5 grid oinstall  4096 Dec  6 09:34 racg

drwxrwxrwt 2 grid oinstall 4096 Dec  6 09:20 racgeut

drwxrwxrwt 2 grid oinstall 4096 Dec  6 09:20 racgevtf

drwxrwxrwt 2 grid oinstall 4096 Dec  6 09:20 racgmain

drwxr-x— 2 grid oinstall  4096 Jan 26 20:57 srvm

Please note most log files in sub-directory inherit ownership of parent directory; and above are just for general reference to tell whether there’s unexpected recursive ownership and permission changes inside the CRS home . If you have a working node with the same version, the working node should be used as a reference.

In Oracle Restart environment:

And here’s what it looks like under $GRID_HOME/log in Oracle Restart environment:

drwxrwxr-x 5 grid oinstall 4096 Oct 31  2009 log

drwxr-xr-x  2 grid oinstall 4096 Oct 31  2009 crs

drwxr-xr-x  3 grid oinstall 4096 Oct 31  2009 diag

drwxr-xr-t 17 root   oinstall 4096 Oct 31  2009 rac1

drwxr-x— 2 grid oinstall  4096 Oct 31  2009 admin

drwxrwxr-t 4 root   oinstall  4096 Oct 31  2009 agent

drwxrwxrwt 2 root oinstall 4096 Oct 31  2009 crsd

drwxrwxr-t 8 root oinstall 4096 Jul 14 08:15 ohasd

drwxr-xr-x 2 grid oinstall 4096 Aug  5 13:40 oraagent_grid

drwxr-xr-x 2 grid oinstall 4096 Aug  2 07:11 oracssdagent_grid

drwxr-xr-x 2 grid oinstall 4096 Aug  3 21:13 orarootagent_grid

-rwxr-xr-x 1 grid oinstall 13782 Aug  1 17:23 alertrac1.log

drwxr-x— 2 grid oinstall  4096 Nov  2  2009 client

drwxr-x— 2 root   oinstall  4096 Oct 31  2009 crsd

drwxr-x— 2 grid oinstall  4096 Oct 31  2009 cssd

drwxr-x— 2 root   oinstall  4096 Oct 31  2009 ctssd

drwxr-x— 2 grid oinstall  4096 Oct 31  2009 diskmon

drwxr-x— 2 grid oinstall  4096 Oct 31  2009 evmd

drwxr-x— 2 grid oinstall  4096 Oct 31  2009 gipcd

drwxr-x— 2 root   oinstall  4096 Oct 31  2009 gnsd

drwxr-x— 2 grid oinstall  4096 Oct 31  2009 gpnpd

drwxr-x— 2 grid oinstall  4096 Oct 31  2009 mdnsd

drwxr-x— 2 grid oinstall  4096 Oct 31  2009 ohasd

drwxrwxr-t 5 grid oinstall  4096 Oct 31  2009 racg

drwxrwxrwt 2 grid oinstall 4096 Oct 31  2009 racgeut

drwxrwxrwt 2 grid oinstall 4096 Oct 31  2009 racgevtf

drwxrwxrwt 2 grid oinstall 4096 Oct 31  2009 racgmain

drwxr-x— 2 grid oinstall  4096 Oct 31  2009 srvm

Network Socket File Location, Ownership and Permission

Network socket files can be located in /tmp/.oracle, /var/tmp/.oracle or /usr/tmp/.oracle

Assuming a Grid Infrastructure environment with node name rac1, CRS owner grid, and clustername eotcs

In Grid Infrastructure cluster environment:

Below is an example output from cluster environment:

drwxrwxrwt  2 root oinstall 4096 Feb  2 21:25 .oracle

./.oracle:

drwxrwxrwt 2 root  oinstall 4096 Feb  2 21:25 .

srwxrwx— 1 grid oinstall    0 Feb  2 18:00 master_diskmon

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:00 mdnsd

-rw-r–r– 1 grid oinstall    5 Feb  2 18:00 mdnsd.pid

prw-r–r– 1 root  root        0 Feb  2 13:33 npohasd

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:00 ora_gipc_GPNPD_rac1

-rw-r–r– 1 grid oinstall    0 Feb  2 13:34 ora_gipc_GPNPD_rac1_lock

srwxrwxrwx 1 grid oinstall    0 Feb  2 13:39 s#11724.1

srwxrwxrwx 1 grid oinstall    0 Feb  2 13:39 s#11724.2

srwxrwxrwx 1 grid oinstall    0 Feb  2 13:39 s#11735.1

srwxrwxrwx 1 grid oinstall    0 Feb  2 13:39 s#11735.2

srwxrwxrwx 1 grid oinstall    0 Feb  2 13:45 s#12339.1

srwxrwxrwx 1 grid oinstall    0 Feb  2 13:45 s#12339.2

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:01 s#6275.1

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:01 s#6275.2

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:01 s#6276.1

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:01 s#6276.2

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:01 s#6278.1

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:01 s#6278.2

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:00 sAevm

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:00 sCevm

srwxrwxrwx 1 root  root        0 Feb  2 18:01 sCRSD_IPC_SOCKET_11

srwxrwxrwx 1 root  root        0 Feb  2 18:01 sCRSD_UI_SOCKET

srwxrwxrwx 1 root  root        0 Feb  2 21:25 srac1DBG_CRSD

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:00 srac1DBG_CSSD

srwxrwxrwx 1 root  root        0 Feb  2 18:00 srac1DBG_CTSSD

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:00 srac1DBG_EVMD

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:00 srac1DBG_GIPCD

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:00 srac1DBG_GPNPD

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:00 srac1DBG_MDNSD

srwxrwxrwx 1 root  root        0 Feb  2 18:00 srac1DBG_OHASD

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:01 sLISTENER

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:01 sLISTENER_SCAN2

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:01 sLISTENER_SCAN3

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:00 sOCSSD_LL_rac1_

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:00 sOCSSD_LL_rac1_eotcs

-rw-r–r– 1 grid oinstall    0 Feb  2 18:00 sOCSSD_LL_rac1_eotcs_lock

-rw-r–r– 1 grid oinstall    0 Feb  2 18:00 sOCSSD_LL_rac1__lock

srwxrwxrwx 1 root  root        0 Feb  2 18:00 sOHASD_IPC_SOCKET_11

srwxrwxrwx 1 root  root        0 Feb  2 18:00 sOHASD_UI_SOCKET

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:00 sOracle_CSS_LclLstnr_eotcs_1

-rw-r–r– 1 grid oinstall    0 Feb  2 18:00 sOracle_CSS_LclLstnr_eotcs_1_lock

srwxrwxrwx 1 root  root        0 Feb  2 18:01 sora_crsqs

srwxrwxrwx 1 root  root        0 Feb  2 18:00 sprocr_local_conn_0_PROC

srwxrwxrwx 1 root  root        0 Feb  2 18:00 sprocr_local_conn_0_PROL

srwxrwxrwx 1 grid oinstall    0 Feb  2 18:00 sSYSTEM.evm.acceptor.auth

In Oracle Restart environment:

And below is an example output from Oracle Restart environment:

drwxrwxrwt  2 root oinstall 4096 Feb  2 21:25 .oracle

./.oracle:

srwxrwx— 1 grid oinstall 0 Aug  1 17:23 master_diskmon

prw-r–r– 1 grid oinstall 0 Oct 31  2009 npohasd

srwxrwxrwx 1 grid oinstall 0 Aug  1 17:23 s#14478.1

srwxrwxrwx 1 grid oinstall 0 Aug  1 17:23 s#14478.2

srwxrwxrwx 1 grid oinstall 0 Jul 14 08:02 s#2266.1

srwxrwxrwx 1 grid oinstall 0 Jul 14 08:02 s#2266.2

srwxrwxrwx 1 grid oinstall 0 Jul  7 10:59 s#2269.1

srwxrwxrwx 1 grid oinstall 0 Jul  7 10:59 s#2269.2

srwxrwxrwx 1 grid oinstall 0 Jul 31 22:10 s#2313.1

srwxrwxrwx 1 grid oinstall 0 Jul 31 22:10 s#2313.2

srwxrwxrwx 1 grid oinstall 0 Jun 29 21:58 s#2851.1

srwxrwxrwx 1 grid oinstall 0 Jun 29 21:58 s#2851.2

srwxrwxrwx 1 grid oinstall 0 Aug  1 17:23 sCRSD_UI_SOCKET

srwxrwxrwx 1 grid oinstall 0 Aug  1 17:23 srac1DBG_CSSD

srwxrwxrwx 1 grid oinstall 0 Aug  1 17:23 srac1DBG_OHASD

srwxrwxrwx 1 grid oinstall 0 Aug  1 17:23 sEXTPROC1521

srwxrwxrwx 1 grid oinstall 0 Aug  1 17:23 sOCSSD_LL_rac1_

srwxrwxrwx 1 grid oinstall 0 Aug  1 17:23 sOCSSD_LL_rac1_localhost

-rw-r–r– 1 grid oinstall 0 Aug  1 17:23 sOCSSD_LL_rac1_localhost_lock

-rw-r–r– 1 grid oinstall 0 Aug  1 17:23 sOCSSD_LL_rac1__lock

srwxrwxrwx 1 grid oinstall 0 Aug  1 17:23 sOHASD_IPC_SOCKET_11

srwxrwxrwx 1 grid oinstall 0 Aug  1 17:23 sOHASD_UI_SOCKET

srwxrwxrwx 1 grid oinstall 0 Aug  1 17:23 sgrid_CSS_LclLstnr_localhost_1

-rw-r–r– 1 grid oinstall 0 Aug  1 17:23 sgrid_CSS_LclLstnr_localhost_1_lock

srwxrwxrwx 1 grid oinstall 0 Aug  1 17:23 sprocr_local_conn_0_PROL

Diagnostic file collection

If the issue can’t be identified with the note, as root, please run $GRID_HOME/bin/diagcollection.sh on all nodes, and upload all .gz files it generated in current directory.

时间: 2024-12-01 11:27:21

【RAC】How to Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]的相关文章

【MOS】Top 5 Grid Infrastructure Startup Issues (文档 ID 1368382.1)

 Top 5 Grid Infrastructure Startup Issues (文档 ID 1368382.1) In this Document Purpose Scope Details   Issue #1: CRS-4639: Could not contact Oracle High Availability Services, ohasd.bin not running or ohasd.bin is running but no init.ohasd or other pro

【RAC】Oracle 11gR2 RAC 中的 Grid Plug and Play(GPnP) 是什么?

[RAC]Oracle 11gR2 RAC 中的 Grid Plug and Play(GPnP) 是什么? 一. 什么是GPnP?   Grid Plug and Play (GPnP):Foundation for a Dynamic Cluster Management    (1)GPnPeliminates the need for a per node configuration –It is an underlying gridconcept that enables the au

【RAC】 RAC For W2K8R2 安装--grid的安装(四)

[RAC] RAC For W2K8R2 安装--grid的安装(四) 一.1  BLOG文档结构图       一.2  前言部分   一.2.1  导读 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① RAC for windows 2008R2 的安装 ② rac环境下共享存储的规划和搭建 ③ starwind软件的应用 ④ VMware workstation 如何做共享存储 ⑤ rac数据的静默安装和dbca静默建库 ⑥

【RAC】 RAC For W2K8R2 安装--安装过程中碰到的问题(九)

[RAC] RAC For W2K8R2 安装--安装过程中碰到的问题(九) 一.1  BLOG文档结构图       一.2  前言部分   一.2.1  导读 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① RAC for windows 2008R2 的安装 ② rac环境下共享存储的规划和搭建 ③ starwind软件的应用 ④ VMware workstation 如何做共享存储 ⑤ rac数据的静默安装和dbca静默建库

【RAC】参数CLUSTER_INTERCONNECTS

[RAC]参数CLUSTER_INTERCONNECTS CLUSTER_INTERCONNECTS参数定义一个私有网络,这个参数将影响GCS和GES服务网络接口的选择. 该参数主要用于以下目的: 1.覆盖默认的内联网络 2.单一的网络带宽不能满足RAC数据库的带宽要求,增加带宽. CLUSTER_INTERCONNECTS将信息存储在集群注册表中,明确覆盖以下内容: 1.存储在OCR中通过oifcfg命令查看的网络分类. 2.Oracle选择的默认内部连接. 该参数默认值是空,可以包含一到多个

【RAC】RAC中的负载均衡和故障切换--TAF配置

[RAC]RAC中的负载均衡和故障切换--TAF配置 涉及到的内容包括:   Oracle RAC 客户端连接负载均衡(Load Balance)      实现负载均衡(Load Balance)是Oracle RAC最重要的特性之一,主要是把负载平均分配到集群中的各个节点,以提高系统的整体吞吐能力.通常情况下有两种方式来实现负载均衡,一个是基于客户端连接的负载均衡,一个是基于服务器端监听器(Listener)收集到的信息来将新的连接请求分配到连接数较少实例上的实现方式.本文主要讨论的是基于客

【RAC】11g R2 RAC新特性之Highly Available IP(HAIP)

[RAC]11g R2 RAC新特性之Highly Available IP(HAIP) 在Oracle 11.2.0.2之前,私网的冗余一般是通过在OS上做网卡绑定(如Bond等)来实现的,从Oracle 11.2.0.2版本开始推出HAIP(Highly Available Virtual IP)技术替代了操作系统层面的网卡绑定技术,功能更强大.更兼容.HAIP通过其提供的独特的169.254.*网段的IP地址实现集群内部链接的高可用及负载均衡.所以,在11.2.0.2或更高版本安装RAC的

【RAC】RAC安装错误手工卸载

[RAC]RAC安装错误手工卸载                    一步一步在RHEL6.5+VMware Workstation 10上搭建 oracle 11gR2 rac + dg 的系列blog于去年10月份更新完毕,但是少了个内容就是如何卸载rac,因为rac的安装不可能一次成功,有时候可能需要折腾多次,这样的话就需要清除之前装进来的一些东西,清除的内容如下,步骤先后没有关系,至于使用oracle专门的卸载包来卸载这里就不演示了.     注意,以下命令在2个节点均需要执行,当然有

【RAC】将RAC备份集恢复为单实例数据库

[RAC]将RAC备份集恢复为单实例数据库 1.1  BLOG文档结构图   1.2  前言部分   1.2.1  导读 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① rac数据库的备份集是如何恢复到单实例的数据库 ② ASM文件系统到OS文件系统的转换 ③ 一般的备份恢复过程       本文如有错误或不完善的地方请大家多多指正,ITPUB留言或QQ皆可,您的批评指正是我写作的最大动力. 1.2.2  实验环境介绍   源库:1