对昨天提出的问题做了一个简单的分析和排查,也算是有了一个交代,上一篇文章在 dg broker校验失败的一个奇怪问题 http://blog.itpub.net/23718752/viewspace-2064499/
我查看了最近的日志,发现在半个月以前有一行日志引起了我的注意。
Thu Mar 03 17:32:12 2016
ALTER SYSTEM SET log_archive_dest_state_2='DEFER' SCOPE=BOTH;
关于这个DEFER的设置,让我想起了之前的一个设置。
原来的主库发生了硬件电源故障,启用备用电源之后,勉强撑了几个小时,因为数据库之前使用的异机逻辑备份,恢复起来还是需要些时间,直接就找了台机器搭建了dataguard,然后做了switchover,把数据库迁移到了新的服务器上,然后在新的备库上又搭建了一套相应的dataguard环境,在搭建新的dataguard之前,原有存在电源故障的机器还是可用,但是因为硬件已经过保,就直接做了服务器退还。为了防止后续的归档检查失败,就设置了log_archive-dest_state_2=DEFER,搭建dataguard成功之后,就把服务器从dg broker里删除了。
所以从这个不起眼的过程来看,log_archive_dest_state有了三个状态的切换,defer,reset,enable
我们能不能简单复现一下这个问题,答案是肯定的。而且可以直接用这套环境来模拟一下。
首先dg broker检查没有任何问题。
DGMGRL> show configuration
Configuration - testdb_dg
Protection Mode: MaxPerformance
Databases:
sactvdb - Primary database
s2actvdb - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
我们在主库设置归档路径状态为defer
SQL> ALTER SYSTEM SET log_archive_dest_state_2=DEFER;
System altered.
这个时候如果再次检查,dg broker就是下面的状态了。
DGMGRL> show configuration;
Configuration - testdb_dg
Protection Mode: MaxPerformance
Databases:
sactvdb - Primary database
Error: ORA-16764: redo transport service to a standby database is not running
s2actvdb - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
ERROR
然后我们把备库停掉。来看看主库的变化。
可以从主库日志看到下面的信息,自defer的日志之后,开始有了一些网络的检查,失败几次之后,在日志最后做了一个reset的操作。
ALTER SYSTEM SET log_archive_dest_state_2='DEFER' SCOPE=BOTH;
Sat Mar 26 20:35:54 2016
***********************************************************************
Fatal NI connect error 12528, connecting to:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=xxxx)(PORT=1528)))(CONNECT_DATA=(SERVICE_NAME=s2actvdb_DGB)(CID=(PROGRAM=oracle)(HOST=testdb2.test.com)(USER=oracle))))
VERSION INFORMATION:
TNS for Linux: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
Time: 26-MAR-2016 20:35:54
Tracing not turned on.
Tns error struct:
ns main err code: 12564
TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
***********************************************************************
Fatal NI connect error 12528, connecting to:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.127.130.58)(PORT=1528)))(CONNECT_DATA=(SERVICE_NAME=s2actvdb_DGB)(CID=(PROGRAM=oracle)(HOST=testdb2.test.com)(USER=oracle))))
VERSION INFORMATION:
TNS for Linux: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
Time: 26-MAR-2016 20:35:54
Tracing not turned on.
Tns error struct:
ns main err code: 12564
TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
***********************************************************************
。。。
LNS: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3135)
LNS: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Error 3135 for archive log file 2 to 's2actvdb'
Errors in file /U01/app/oracle/diag/rdbms/sactvdb/actvdb/trace/actvdb_nsa2_20231.trc:
ORA-03135: connection lost contact
LNS: Failed to archive log 2 thread 1 sequence 10137 (3135)
Sat Mar 26 20:36:04 2016
***********************************************************************
。。。
TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
Sat Mar 26 20:36:04 2016
ALTER SYSTEM SET log_archive_dest_state_2='RESET' SCOPE=BOTH;
然后重启备库到open状态,稍作等待,检查dg broker的状态,一切显示就正常了。
DGMGRL> show configuration;
Configuration - testdb_dg
Protection Mode: MaxPerformance
Databases:
sactvdb - Primary database
s2actvdb - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
可以看到这个问题看起来比较清楚了,归档路径从defer变为了reset,然后变为了enable.
那么这个问题不设置defer,是否会出现reset的操作呢,能否复现,可以简单再测试一下。
再次停掉备库,然后查看主库的日志如下:
Fatal NI connect error 12514, connecting to:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.127.130.58)(PORT=1528)))(CONNECT_DATA=(SERVICE_NAME=s2actvdb_DGB)(CID=(PROGRAM=oracle)(HOST=testdb2.test.com)(USER=oracle))))
VERSION INFORMATION:
TNS for Linux: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
Time: 26-MAR-2016 20:41:48
Tracing not turned on.
Tns error struct:
ns main err code: 12564
TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
Sat Mar 26 20:41:48 2016
ALTER SYSTEM SET log_archive_dest_state_2='RESET' SCOPE=BOTH;
可以看到,日志末尾会有reset的字样,所以说和之前的defer是没有直接关系,dg broker的配置下,这是一个状态的自动变化。
那么备库停掉之后,再次重启是否会有dg broker中备库为disable的状态呢。
DGMGRL> show configuration;
Configuration - testdb_dg
Protection Mode: MaxPerformance
Databases:
sactvdb - Primary database
s2actvdb - Physical standby database (disabled)
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
可以简单复现问题,那就是备库在nomount状态,11g环境中,mount,open状态下都没有此类问题,也是因为这个时候备库的RFS,MRP可以正常工作。
复现过程如下:
重启备库到nomount状态
DGMGRL> show configuration;
Configuration - testdb_dg
Protection Mode: MaxPerformance
Databases:
sactvdb - Primary database
s2actvdb - Physical standby database (disabled)
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
启动备库到mount
DGMGRL> show configuration;
Configuration - testdb_dg
Protection Mode: MaxPerformance
Databases:
sactvdb - Primary database
s2actvdb - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
所以通过以上的测试复现,可以看到这个问题其实不奇怪,备库重启,但是备库在nomount阶段导致了这个奇怪的现象,但是对于dataguard而言,归档路径的状态有defer,reset,enable几种情况,可能会以reset为一个临界点来做转换。