[20150715]10513事件禁止smon回滚.txt
--如果数据库特别是特别忙的数据库出现异常当机的情况,有大量的数据脏块没有更新到数据文件中。还有许多事务没有提交。
--通过设置10513事件,可以加快数据库的启动,但是禁用smon进行tx recovery(所谓tx recovery就是open后数据文件包含提交和未提交
--数据,数据不一致),不会造成数据库不一致,虽然禁用了smon自动恢复,但是当查询的时候还是会进行回滚从undo中读取回滚数据(等
--同于用到哪个对象回滚哪个对象,这种方式会带来压力,且若undo损坏就十分麻烦了,那么这将是另一个恢复问题select
--segment_name,status,tablespace_name from dba_rollback_segs看那个段损坏使用隐藏参数_offline_rollback_segments 标记,然
--后drop rollback segment 'xxx' ,此时才会造成真正的数据不一致)
--还是通过测试说明情况:
1.建立测试环境:
SCOTT@test> @ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.3.0 Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
SCOTT@test> create table t as select rownum id , cast ( 'test' as varchar2(20)) name from xmltable ('1 to 100000');
Table created.
SCOTT@test> alter system flush buffer_cache ;
System altered.
-- 保证脏块写盘。
SCOTT@test> select rowid,t.* from t where rownum=1;
ROWID ID NAME
------------------ ---------- --------------------
AABLTpAAEAAABKbAAA 1 test
SCOTT@test> @ lookup_rowid AABLTpAAEAAABKbAAA
OBJECT FILE BLOCK ROW DBA TEXT
---------- ---------- ---------- ---------- -------------------- ----------------------------------------
308457 4 4763 0 4,4763 alter system dump datafile 4 block 4763
2.修改数据不提交看看:
SCOTT@test> update t set name='TTTTTEST' ;
...
不等结束...
打开另外的会话:
SYS@test> shutdown abort ;
ORACLE instance shut down.
3.通过bbed观察:
BBED> set dba 4,4763
DBA 0x0100129b (16781979 4,4763)
BBED> x /rnc rowdata
rowdata[0] @2004
----------
flag@2004: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@2005: 0x00
cols@2006: 2
col 0[3] @2007: 524
col 1[4] @2011: test
--可以发现修改的数据并没有写盘。使用 dump /v offset 0观察也可以证明。
SYS@test> startup mount;
ORACLE instance started.
Total System Global Area 1603411968 bytes
Fixed Size 2228784 bytes
Variable Size 989859280 bytes
Database Buffers 603979776 bytes
Redo Buffers 7344128 bytes
Database mounted.
SYS@test> alter system set events '10513 trace name context forever,level 2';
System altered.
SYS@test> alter database open ;
Database altered.
--观察alert*.log文件:
Completed: ALTER DATABASE MOUNT
Wed Jul 15 11:03:19 2015
OS Pid: 17985 executed alter system set events '10513 trace name context forever,level 2'
Wed Jul 15 11:04:05 2015
alter database open
Beginning crash recovery of 1 threads
parallel recovery started with 3 processes
Started redo scan
Completed redo scan
read 44694 KB redo, 2776 data blocks need recovery
Started redo application at
Thread 1: logseq 3160, block 32678
Recovery of Online Redo Log: Thread 1 Group 2 Seq 3160 Reading mem 0
Mem# 0: /u01/app/oracle11g/oradata/test/redo02.log
Wed Jul 15 11:04:33 2015
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3161 Reading mem 0
Mem# 0: /u01/app/oracle11g/oradata/test/redo03.log
Wed Jul 15 11:04:56 2015
Completed redo application of 38.43MB
Completed crash recovery at
Thread 1: logseq 3161, block 40136, scn 12698741606
2776 data blocks read, 2776 data blocks written, 44694 redo k-bytes read
Wed Jul 15 11:04:57 2015
LGWR: STARTING ARCH PROCESSES
Wed Jul 15 11:04:57 2015
ARC0 started with pid=23, OS id=17995
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Wed Jul 15 11:04:58 2015
ARC1 started with pid=24, OS id=17997
Wed Jul 15 11:04:59 2015
ARC2 started with pid=25, OS id=17999
Wed Jul 15 11:04:59 2015
ARC3 started with pid=26, OS id=18001
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thread 1 advanced to log sequence 3162 (thread open)
Thread 1 opened at log sequence 3162
Current log# 1 seq# 3162 mem# 0: /u01/app/oracle11g/oradata/test/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Jul 15 11:04:59 2015
SMON: enabling cache recovery
Archived Log entry 4800 added for thread 1 sequence 3161 ID 0x806ffa4c dest 1:
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
[17985] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:1061401088 end:1061401348 diff:260 (2 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Wed Jul 15 11:05:00 2015
Incremental checkpoint up to RBA [0xc5a.3.0], current log tail at RBA [0xc5a.3e.0]
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Wed Jul 15 11:05:02 2015
QMNC started with pid=27, OS id=18003
Completed: alter database open
Wed Jul 15 11:05:07 2015
db_recovery_file_dest_size of 20480 MB is 0.09% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Wed Jul 15 11:05:07 2015
Starting background process CJQ0
Wed Jul 15 11:05:07 2015
CJQ0 started with pid=29, OS id=18017
SYS@test> select usn,UNDOBLOCKSDONE,UNDOBLOCKSDONE from v$fast_start_transactions;
no rows selected
--bbed观察:
BBED> set dba 4,4763
DBA 0x0100129b (16781979 4,4763)
BBED> p *kdbr[0]
rowdata[6982]
-------------
ub1 rowdata[6982] @8173 0x2c
BBED> x /rnc
rowdata[6982] @8173
-------------
flag@8173: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8174: 0x02
cols@8175: 2
col 0[2] @8176: 1
col 1[8] @8179: TTTTTEST
--这个时候说明开机后redo已经应用。也就是前滚。实际上我的数据没有提交。正常的情况应该看到的像前面的情况。
--这个时候做一个查询,如果我执行:
SCOTT@test> select rowid,t.* from t where rownumROWID ID NAME
------------------ ---------- --------------------
AABLTpAAEAAABKbAAA 1 test
--说明已经读取了回滚段信息。
SCOTT@test> alter system checkpoint;
System altered.
--做一次脏块写盘,再使用bbed观察:
BBED> set dba 4,4763
DBA 0x0100129b (16781979 4,4763)
BBED> p *kdbr[0]
rowdata[6982]
-------------
ub1 rowdata[6982] @8173 0x2c
BBED> x /rnc
rowdata[6982] @8173
-------------
flag@8173: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8174: 0x02
cols@8175: 2
col 0[2] @8176: 1
col 1[8] @8179: TTTTTEST
--可以发现这个时候字段name还是看到'TTTTEST',而不是'test'。
SCOTT@test> set autot traceonly
SCOTT@test> select rowid,t.* from t where rownumExecution Plan
--------------------------
Plan hash value: 508354683
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 37 | 2 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | TABLE ACCESS FULL| T | 94407 | 3411K| 2 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUMNote
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
713 consistent gets
0 physical reads
108 redo size
675 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
--713次consistent gets,也说明读取了回滚段信息。
4.正常关闭数据库,继续使用bbed观察:
BBED> set dba 4,4763
DBA 0x0100129b (16781979 4,4763)
BBED> p *kdbr[0]
rowdata[6982]
-------------
ub1 rowdata[6982] @8173 0x2c
BBED> x /rnc
rowdata[6982] @8173
-------------
flag@8173: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8174: 0x02
cols@8175: 2
col 0[2] @8176: 1
col 1[8] @8179: TTTTTEST
--可以发现10513事件禁止smon回滚,但是在读取时还是通过回滚段读取正确的数据,但是这些信息并不会写盘。
5.正常启动数据库,没有10513事件。
SCOTT@test> select rowid,t.* from t where rownum
ROWID ID NAME
------------------ ---------- --------------------
AABLTpAAEAAABKbAAA 1 test
SCOTT@test> alter system checkpoint;
System altered.
BBED> set dba 4,4763
DBA 0x0100129b (16781979 4,4763)
BBED> p *kdbr[0]
rowdata[0]
----------
ub1 rowdata[0] @1329 0x2c
BBED> x /rnc
rowdata[0] @1329
----------
flag@1329: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@1330: 0x00
cols@1331: 2
col 0[2] @1332: 1
col 1[4] @1335: test
--这回正确了。
SCOTT@test> select rowid,t.* from t where rownumExecution Plan
----------------------------------------------------------
Plan hash value: 508354683
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 37 | 2 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | TABLE ACCESS FULL| T | 94407 | 3411K| 2 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUMNote
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
675 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
--4consistent gets,也能说明问题。
--总结:
--1.使用10513打开最好不能给用户正式使用。仅仅作为监测是否打开的情况。
--2.最好还是选择正常的打开方式,总之dba要情况地了解10513事件的作用,解决遇到的数据库问题。
--3.如果再使用参数_corrupted_rollback_segments,这样正确的信息无法显示,如果再drop undo表空间(不知道是否可以,没有测试),情
--况就不妙了。正确的信息再也无法恢复。可以参考我以前写的链接:
-- http://blog.itpub.net/267265/viewspace-1415396/ => [0126]理解_corrupted_rollback_segments
--4.总之一定要认真了解这些参数的含义,选择正确的方式恢复数据库。