最近在无疑中查看一个数据库的日志的时候,发现里面有这么一段内容。
Sat Feb 06 10:07:25 2016
Deleted Oracle managed file +ARCH/testdb2/archivelog/2016_01_13/thread_1_seq_4566.261.901038877
Archived Log entry 9262 added for thread 1 sequence 4678 ID 0x26b3e123 dest 1:
Sat Feb 06 14:04:52 2016
Thread 1 advanced to log sequence 4680 (LGWR switch)
Current log# 1 seq# 4680 mem# 0:
+DATA/testdb2/onlinelog/group_1.261.815382013
Sat Feb 06 14:04:52 2016
LNS: Standby redo logfile selected for thread 1 sequence 4680 for destination
LOG_ARCHIVE_DEST_2
Sat Feb 06 14:04:52 2016
Deleted Oracle managed file
+ARCH/testdb2/archivelog/2016_01_13/thread_1_seq_4567.258.901058415
Archived Log entry 9264 added for thread 1 sequence 4679 ID 0x26b3e123 dest 1:
这段日志的亮点就在于"Deleted Oracle managed file"
这是一个11gR2的主库,使用了ASM作为数据存储。如果看到这段自动删除归档的内容,很可能和一个特性联系起来,那就是闪回区的自动删除归档的阀值 80%,在11g ADG中会有这样的而一个特性,闪回区使用率达到80%以上的时候,会帮助清理一些过期的归档日志文件。同时出现这个问题和另外一个参数有关,那就是闪回 区的大小设置,如果太小了,使用率马上就会达到80%,所以这个参数db_recovery_file_dest_size也值得关注。
目前的参数配置为:
SQL> show parameter reco
NAME TYPE VALUE
------------------------------------ -----------
---------------
control_file_record_keep_time integer 7
db_recovery_file_dest string +ARCH
db_recovery_file_dest_size big integer 100G
db_unrecoverable_scn_tracking boolean TRUE
recovery_parallelism integer 0
那么这个问题中,先不说主备库是否都使用哪个特性,先看看主库闪回区使用率到底是多少呢?
SQL>select *from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
-------------------- ------------------
------------------------- ---------------
CONTROL FILE .01 0 1
REDO LOG 0 0 0
ARCHIVED LOG 3.92 3.89 112
BACKUP PIECE 0 0 0
IMAGE COPY 0 0 0
FLASHBACK LOG 0 0 0
FOREIGN ARCHIVED LOG 0 0 0
所以这个问题就比较奇怪了,使用率远远没有达到80%的阀值,怎么会自动删除呢,当然查看了MOS也发现了一些相关的bug,(4911954),有的可能是v$flash_recovery_area_usage和V$RECOVERY_FILE_DEST的数据情况会有较大出入,会有这种情况,不过我的这个环境从版本的日志来看,是没有关系的。
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
-------------------- ------------------
------------------------- ---------------
CONTROL FILE .01 0 1
REDO LOG 0 0 0
ARCHIVED LOG 3.92 3.89 112
BACKUP PIECE 0 0 0
IMAGE COPY 0 0 0
FLASHBACK LOG 0 0 0
FOREIGN ARCHIVED LOG 0 0 0
然后继续找可能的情况,从日志中发现了一个特别的情况,也是规律,那就是删除的归档是24天以前的,切一次日志,删除一次24天以前的旧归档。
可以通过rman中的操作 crosscheck
archivelog all观察和分析日志得出。
RMAN>crosscheck archivelog
all;
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1668 device type=DISK
validation succeeded for archived log
archived log file name=+ARCH/testdb2/archivelog/2016_01_14/thread_1_seq_4568.369.901073863
RECID=9042 STAMP=901073862
validation succeeded for archived log
archived log file
name=+ARCH/testdb2/archivelog/2016_01_14/thread_1_seq_4569.262.901095119
RECID=9043 STAMP=901095118
validation succeeded for archived log
archived log file
name=+ARCH/testdb2/archivelog/2016_01_14/thread_1_seq_4570.259.901095121
RECID=9045 STAMP=901095121
关于归档的删除策略,其实从rman中也可以做一些配置。如果配置是这样的情况,删除策略就是保留24天。
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 24 DAYS;
不过我这个问题的情况还不是,rman的配置都是取默认,只有一条新增的配置。
CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;
这个时候,怎么通过sql语句来查看rman中的相关设置呢,主要可以通过下面的方式来看。
SELECT NAME,VALUE FROM V$RMAN_CONFIGURATION;
当然里面只有一条自定义的配置
NAME
VALUE
------------------------------ ------------------------------
ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY
我们这个时候来简单测试一下这个配置的影响。取消自定义的配置。取默认值。
RMAN> CONFIGURE ARCHIVELOG
DELETION POLICY clear;
old RMAN configuration parameters:
CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;
RMAN configuration parameters are successfully reset to default value
然后手工切换一次日志。得到的日志如下,又看到了一些额外的信息。
Sat Feb 06 22:16:30 2016
Thread 1 advanced to log sequence 4683 (LGWR switch)
Current log# 7 seq# 4683 mem# 0:
+DATA/testdb2/onlinelog/group_7.271.817297835
Sat Feb 06 22:16:30 2016
Trying to expand controlfile section 11 for Oracle Managed Files
Expanded controlfile section 11 from 224 to 448 records
Requested to grow by 224 records; added 8 blocks of records
LNS: Standby redo logfile selected for thread 1 sequence 4683 for destination
LOG_ARCHIVE_DEST_2
Sat Feb 06 22:16:30 2016
Archived Log entry 9270 added for thread 1 sequence 4682 ID 0x26b3e123 dest 1:
对于看起来有些陌生的日志,当然也是集各家之言。
有一个老外的建议非常专业,我就直接贴出来了,对于这些日志的说明,他的解释如下:
this is nothing to worry
about.
the following is an extract from metalink
Starting in Oracle8 the
controlfile is automatically expanded.
A new datafile is added beyond the maxdatafiles limited specified
in the control file.
This also happens if new records are added to the "log_history"
section of
the control files and there are no entries that could be replaced.
The routine that performs the expansion writes this message to the alert log.
The messages specifies the section that was expanded and the amount of
the expansion. Please note that this message cannot be turned off.
好了,看到这里,我们还是很有必要去看一下涉及的视图 v$controlfile_record_section,
修改rman配置之后的视图内容如下,只显示了alert日志中说明的部分section 11相关的记录。
TYPE RECORD_SIZE RECORDS_TOTAL RECORDS_USED FIRST_INDEX LAST_INDEX LAST_RECID
------------ ----------- ------------- ------------
----------- ---------- ----------
LOG HISTORY 56 292 292 12 11 4683
所以这个问题又指向了控制文件,对于控制文件,我们有一个数据库参数相关。
SQL> show parameter control_file_record_keep_time
NAME TYPE VALUE
------------------------------------ -----------
---------
control_file_record_keep_time integer 7
所以通过日志中的一条信息,逐步测试最后发现了一些意料之外的情况,还是需要好好琢磨琢磨这些内容。