[20140422]使用dgmgrl管理dataguard(12).txt
参考链接:
http://blog.itpub.net/267265/viewspace-1142649/
http://blog.itpub.net/267265/viewspace-1143027/
http://blog.itpub.net/267265/viewspace-1143058/
http://blog.itpub.net/267265/viewspace-1143126/
http://blog.itpub.net/267265/viewspace-1143480/
http://blog.itpub.net/267265/viewspace-1144742/
http://blog.itpub.net/267265/viewspace-1145573/
http://blog.itpub.net/267265/viewspace-1145697/
http://blog.itpub.net/267265/viewspace-1145727/
http://blog.itpub.net/267265/viewspace-1146558/
http://blog.itpub.net/267265/viewspace-1146575/
-- 我的测试环境:primary的tnsnames别名test,standby主机的tnsnames别名testdg。数据库版本:11GR2.
-- 昨天测试了FSFO,但是我遇到1个问题,如何触发failover,正常shutdown immediate测试无法触发FSFO切换.
--晚上看了一些文档,google一些blog,链接如下:
-- http://www.databasejournal.com/features/oracle/article.php/3849106/Fast-Start-Failover-in-Oracle-11g-Data-Guard.htm
-- 摘要如下:
Table 7-1. Fast-Start Failover: Triggering Events
Event Class Triggering Event
------------------------------------------------------------------------------------------------------------------
Connectivity Loss Network connectivity is lost simultaneously between the primary database and:
the FSFO itself; and
the standby database designated as the FSF target
and the connectivity time lost exceeds the FSF threshold
Database Health Check Failure A database health check detects any of the following optionally-configured failures:
Any datafile has gone offline due to a write error
A critical database object has dictionary corruption
A control file is permanently destroyed because a disk has failed
Log Writer (LGWR) cannot write to any member of a log group due to an I/O failure
Archiver (ARCn) cannot archive a redo log because the destination is full or unavailable
Instance Crash (Single Instance) The primary database's instance has crashed
Instances Crash (RAC) All instances for a RAC primary database have crashed
Shutdown Abort on Primary The primary database is shut down with the SHUTDOWN ABORT command
------------------------------------------------------------------------------------------------------------------
--shutdown abort 才会触发.
Table 7-3. Fast-Start Failover: Advanced Failover Triggers
Trigger Active By Default? Explanation
-----------------------------------------------------------------------------------------------------------------------
Datafile Offline Yes Tells FSFO to perform a failover whenever a datafile is offline on
the primary database
Corrupted Controlfile Yes Tells FSFO to perform a failover whenever a corrupted control file
is detected on the primary database
Corrupted Dictionary Yes Requests FSFO to perform a failover whenever corruption is detected
within the primary database's data dictionary
Inaccessible Logfile No Forces FSFO to perform a failover whenever any one online redo log
member cannot be accessed on the primary database
Stuck Archiver No Tells FSFO to perform a failover whenever archived redo logging
cannot proceed on the primary database because there is no more
space to write out archived redo logs on that server
-----------------------------------------------------------------------------------------------------------------------
--还有一些高级的Failover Triggers.
--看看我的fast_start failover配置:
DGMGRL> show fast_start failover
Fast-Start Failover: ENABLED
Threshold: 30 seconds
Target: testdg
Observer: testdb
Lag Limit: 30 seconds
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Offline YES
Oracle Error Conditions:
(none)
--如果想打开Stuck Archiver 触发条件.执行如下:
DGMGRL> enable fast_start failover Condition "Stuck Archiver"
Succeeded.
DGMGRL> show fast_start failover
Fast-Start Failover: ENABLED
Threshold: 30 seconds
Target: testdg
Observer: testdb
Lag Limit: 30 seconds
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver YES
Datafile Offline YES
Oracle Error Conditions:
(none)
DGMGRL> disable fast_start failover Condition "Stuck Archiver"
Succeeded.
--文档中还提到,I could configure FSFO to initiate automatic failover whenever the ORA-55623 error is detected on the
--primary database with thefollowing command:
$ oerr ora 55623
55623, 00000, "Flashback Archive \"%s\" is blocking and tracking on all tables is suspended"
// *Cause: Flashback archive tablespace has run out of space.
// *Action: Add tablespace or increase tablespace quota for the flashback archive.
//
DGMGRL> ENABLE FAST_START FAILOVER CONDITION 55623;
Succeeded.
DGMGRL> show fast_start failover
Fast-Start Failover: ENABLED
Threshold: 30 seconds
Target: testdg
Observer: testdb
Lag Limit: 30 seconds
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Offline YES
Oracle Error Conditions:
ORA-55623: Flashback Archive "%s" is blocking and tracking on all tables is suspended
--如果Flashback archive磁盘空间满的话,会触发failover.自己做一个flashrecovery满的情况看看.
SYS@test> show parameter recover
NAME TYPE VALUE
---------------------------- ------------ ---------------------------------------
db_recovery_file_dest string /u01/app/oracle11g/flash_recovery_area
db_recovery_file_dest_size big integer 20G
SYS@test> alter system set db_recovery_file_dest_size=200M;
System altered.
RMAN> backup datafile 1;
Starting backup at 2014-04-23 09:13:01
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=12 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/u01/app/oracle11g/oradata/test/system01.dbf
channel ORA_DISK_1: starting piece 1 at 2014-04-23 09:13:07
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 04/23/2014 09:13:10
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 81920 bytes disk space from 209715200 limit
continuing other job steps, job failed will not be re-run
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 2014-04-23 09:13:13
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 04/23/2014 09:13:14
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 20201472 bytes disk space from 209715200 limit
--可以发现出现的错误是ORA-19809: limit exceeded for recovery files.加入如下触发条件.
DGMGRL> ENABLE FAST_START FAILOVER CONDITION 19809;
Succeeded.
DGMGRL> show fast_start failover
Fast-Start Failover: ENABLED
Threshold: 30 seconds
Target: testdg
Observer: testdb
Lag Limit: 30 seconds
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Offline YES
Oracle Error Conditions:
ORA-55623: Flashback Archive "%s" is blocking and tracking on all tables is suspended
ORA-19809: limit exceeded for recovery files
--再次执行backup datafile 1;
DGMGRL> show fast_start failover
ORA-03113: end-of-file on communication channel
Process ID: 28425
Session ID: 209 Serial number: 45
Configuration details cannot be determined by DGMGRL
--可以发现已经触发,查看observer窗口:
09:17:45.93 Wednesday, April 23, 2014
Initiating Fast-Start Failover to database "testdg"...
Performing failover NOW, please wait...
Failover succeeded, new primary is "testdg"
09:17:53.59 Wednesday, April 23, 2014
DGMGRL> connect sys/btbtms@testdg
Connected.
DGMGRL> show fast_start failover
Fast-Start Failover: ENABLED
Threshold: 30 seconds
Target: test
Observer: testdb
Lag Limit: 30 seconds
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Offline YES
Oracle Error Conditions:
ORA-55623: Flashback Archive "%s" is blocking and tracking on all tables is suspended
ORA-19809: limit exceeded for recovery files
DGMGRL> show configuration
Configuration - study
Protection Mode: MaxPerformance
Databases:
testdg - Primary database
Warning: ORA-16829: fast-start failover configuration is lagging
test - (*) Physical standby database (disabled)
ORA-16661: the standby database needs to be reinstated
Fast-Start Failover: ENABLED
Configuration Status:
WARNING
--可以发现failover已经到testdg. 还原现场:
SYS@test> startup mount
ORACLE instance started.
Total System Global Area 1603411968 bytes
Fixed Size 2228784 bytes
Variable Size 1006636496 bytes
Database Buffers 587202560 bytes
Redo Buffers 7344128 bytes
Database mounted.
SYS@test> alter system set db_recovery_file_dest_size=20G;
System altered.
DGMGRL> show configuration
Configuration - study
Protection Mode: MaxPerformance
Databases:
testdg - Primary database
Warning: ORA-16829: fast-start failover configuration is lagging
test - (*) Physical standby database (disabled)
ORA-16661: the standby database needs to be reinstated
Fast-Start Failover: ENABLED
Configuration Status:
WARNING
DGMGRL> reinstate database test;
Reinstating database "test", please wait...
Operation requires shutdown of instance "test" on database "test"
Shutting down instance "test"...
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "test" on database "test"
Starting instance "test"...
ORACLE instance started.
Database mounted.
Continuing to reinstate database "test" ...
Reinstatement of database "test" succeeded
DGMGRL> DGMGRL> show configuration
Configuration - study
Protection Mode: MaxPerformance
Databases:
testdg - Primary database
test - (*) Physical standby database
Fast-Start Failover: ENABLED
Configuration Status:
SUCCESS
DGMGRL> switchover to test;
Performing switchover NOW, please wait...
New primary database "test" is opening...
Operation requires shutdown of instance "testdg" on database "testdg"
Shutting down instance "testdg"...
ORACLE instance shut down.
Operation requires startup of instance "testdg" on database "testdg"
Starting instance "testdg"...
ORACLE instance started.
Database mounted.
Database opened.
Switchover succeeded, new primary is "test"
DGMGRL> disable fast_start failover Condition 55623
Succeeded.
DGMGRL> disable fast_start failover Condition 19809
Succeeded.
总结:
FSFO确实很好东西,如果做的好,能很好提高HA,减少待机时间,问题是我总觉得有多少人敢在生产系统使用.测试学习还行.
在生产系统使用自己总有点担心^_^.