[20140507]实例crash恢复.txt
如果发生了实例崩溃,只需要在日志文件中找到检查点位置(low cache rba),从此开始应用所有的重做日志文件, 就完成了前滚操作。
实例崩溃后,再次启动数据库,oracle会到控制文件中读取low cache rba,这就是检查点位置。 从此处开始应用重做日志,应用到on
disk rba的位置。on disk rba是磁盘中重做日志文件的最后一条重做记录的rba。 加快恢复速度,确定恢复日志的起点。
不管redo记录的事务提交还是非提交,都会应用,这个相当于前滚Roll forward (crash recovery),然后再根据undo做事务的恢复,这个叫
后滚Roll backward (transaction recovery)
In the roll backward phase, the data within the undo tablespace will be applied to roll back the uncommitted changes.The
SMON process will initiate the rollback of the uncommitted buffers.
--自己做一个测试,理解这些问题:
1.建立测试环境:
SCOTT@test> @ver
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
SCOTT@test> create table t ( id number,name varchar2(10));
Table created.
SCOTT@test> insert into t select rownum,'ABCDEF' from dual connect by level10 rows created.
SCOTT@test> commit ;
Commit complete.
SCOTT@test> select rowid,t.* from t where rownumROWID ID NAME
------------------ ---------- --------------------
AABF02AAEAAAAJ1AAA 1 ABCDEF
SCOTT@test> @lookup_rowid AABF0+AAEAAAAJ1AAA
OBJECT FILE BLOCK ROW DBA TEXT
---------- ---------- ---------- ---------- -------------------- ----------------------------------------
286014 4 629 0 4,629 alter system dump datafile 4 block 629 ;
SCOTT@test> alter system checkpoint ;
System altered.
--将脏块写盘.
SCOTT@test> update t set name='1234' ;
10 rows updated.
--注意name的字符串长度与原来不同,不提交,打开另外的会话.执行
@checkpoint
shutdown abort.
SYS@test> @checkpoint
检查点队列
检查点队列 on disk rba 检查点队列
脏块数量 时间戳 当前时间 on disk rba scn 检查点心跳
CPDRT low_rba on_disk_rba CPODT SYSDATE DIFF_DATE CPODS CPHBT CURRENT_SCN DIFF_SCN INDX
------------ -------------------- -------------------- ------------------- ------------------- ---------- ---------------- ------------ ------------ ------------ ------------
24 143.1242.0 143.1358.0 2014-05-07 15:35:48 2014-05-07 15:35:49 1.00 3270385840 846926007 3270385842 2 0
DATAFILE:
rtckp_rba RTCKP_SCN CURRENT_SCN DIFF_SCN RTCKP_TIM SYSDATE DIFF_DATE
-------------------- ---------------- ------------ ------------ ------------------- ------------------- ----------
143.1242.16 3270385727 3270385843 116 2014-05-07 15:34:13 2014-05-07 15:35:49 96.00
v$instance_recovery:
INST_ID ACTUAL_REDO_BLKS TARGET_REDO_BLKS 90%_blks TIMEOUT_BLKS TARGET_MTTR ESTIMATED_MTTR
------------ ---------------- ---------------- ------------ ------------ ------------ --------------
1 116 165888 165888 0 12
SYS@test> shutdown abort
ORACLE instance shut down.
==使用bbed观察:
BBED> set dba 4,629
DBA 0x01000275 (16777845 4,629)
BBED> p *kdbr[0]
rowdata[0]
----------
ub1 rowdata[0] @8058 0x2c
BBED> x /rnc
rowdata[0] @8058
----------
flag@8058: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8059: 0x01
cols@8060: 2
col 0[2] @8061: 1
col 1[6] @8064: ABCDEF
BBED> dump /v
File: /u01/app/oracle11g/oradata/test/users01.dbf (4)
Block: 629 Offsets: 8058 to 8191 Dba:0x01000275
---------------------------------------------------------------------------------------------------------------------------------------------------------------
2c010202 c1020641 42434445 462c0102 02c10306 41424344 45462c01 0202c104 06414243 4445462c 010202c1 05064142 l ,......ABCDEF,......ABCDEF,......ABCDEF,......AB
43444546 2c010202 c1060641 42434445 462c0102 02c10706 41424344 45462c01 0202c108 06414243 4445462c 010202c1 l CDEF,......ABCDEF,......ABCDEF,......ABCDEF,....
09064142 43444546 2c010202 c10a0641 42434445 462c0102 02c10b06 41424344 45460106 f31f l ..ABCDEF,......ABCDEF,......ABCDEF....
--可以发现里面的信息是第1次insert的记录.
2.启动数据库到mount状态:
SYS@test> startup mount
ORACLE instance started.
Total System Global Area 1603411968 bytes
Fixed Size 2228784 bytes
Variable Size 1006636496 bytes
Database Buffers 587202560 bytes
Redo Buffers 7344128 bytes
Database mounted.
--数据库在mount状态.
SYS@test> select * from v$log ;
GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ------------------- ------------ -------------------
1 1 142 52428800 512 1 YES INACTIVE 3270384619 2014-05-07 15:24:16 3270384751 2014-05-07 15:24:18
3 1 141 52428800 512 1 YES INACTIVE 3270344812 2014-05-07 11:26:32 3270384619 2014-05-07 15:24:16
2 1 143 52428800 512 1 NO CURRENT 3270384751 2014-05-07 15:24:18 2.8147E+14
SYS@test> @checkpoint
REDO:
检查点队列
检查点队列 on disk rba 检查点队列
脏块数量 时间戳 当前时间 on disk rba scn 检查点心跳
CPDRT low_rba on_disk_rba CPODT SYSDATE DIFF_DATE CPODS CPHBT CURRENT_SCN DIFF_SCN INDX
------------ -------------------- -------------------- ------------------- ------------------- ---------- ---------------- ------------ ------------ ------------ ------------
24 143.1242.0 143.1358.0 2014-05-07 15:35:48 2014-05-07 15:43:08 440.00 3270385840 846965749 0 -3270385840 0
DATAFILE:
rtckp_rba RTCKP_SCN CURRENT_SCN DIFF_SCN RTCKP_TIM SYSDATE DIFF_DATE
-------------------- ---------------- ------------ ------------ ------------------- ------------------- ----------
143.1242.16 3270385727 0 -3270385727 2014-05-07 15:34:13 2014-05-07 15:43:08 535.00
v$instance_recovery:
INST_ID ACTUAL_REDO_BLKS TARGET_REDO_BLKS 90%_blks TIMEOUT_BLKS TARGET_MTTR ESTIMATED_MTTR
------------ ---------------- ---------------- ------------ ------------ ------------ --------------
1 0 0 0
=>恢复就是从low_rba(143.1242.0)=>on_disk_rba(143.1358.0).
$ strace -fo /tmp/xx1.txt rlsql
SYS@test> alter database open ;
Database altered.
3.检查验证:
--首先检查alert*.log
Beginning crash recovery of 1 threads
parallel recovery started with 3 processes
Started redo scan
Completed redo scan
read 58 KB redo, 24 data blocks need recovery
Started redo application at
Thread 1: logseq 143, block 1242, scn 3270385727
Recovery of Online Redo Log: Thread 1 Group 2 Seq 143 Reading mem 0
Mem# 0: /u01/app/oracle11g/oradata/test/redo02.log
Completed redo application of 0.01MB
Completed crash recovery at
Thread 1: logseq 143, block 1359, scn 3270405841
24 data blocks read, 24 data blocks written, 58 redo k-bytes read
Wed May 07 15:44:51 2014
LGWR: STARTING ARCH PROCESSES
--注意看提示:
Started redo application at
Thread 1: logseq 143, block 1242, scn 3270385727
...
Completed crash recovery at
Thread 1: logseq 143, block 1359, scn 3270405841
--这些与前面查询的low_rba=143.1242.0,on_disk_rba=143.1359.0基本对上,仅仅结束多读1块.
--检查/tmp/xx1.txt文件. redo的1块=512byte.
--1242*512=635904
--1359*512=695808
--695808-635904=59904
--59904/1024=58.5
$ grep 635904 /tmp/xx1.txt
11984 pread(269, "\1\"\0\0\332\4\0\0\217\0\0\0\20\200R(`\4\0\0\6\0\0\0? "..., 4194304, 635904) = 4194304
$ grep 695808 /tmp/xx1.txt
11984 pread(269, "\1\"\0\0O\5\0\0\0\0\0\0\0\200N\247\0\0\0\0\0\0\0\0\0\0"..., 4096, 695808) = 4096
--从跟踪文件可以看出这些信息.
4.查看信息:
SCOTT@test> select * from t where rownum ID NAME
---------- --------------------
1 ABCDEF
--使用bbed观察:
BBED> set dba 4,629
DBA 0x01000275 (16777845 4,629)
BBED> p *kdbr[0]
rowdata[0]
----------
ub1 rowdata[0] @7818 0x2c
BBED> x /rnc
rowdata[0] @7818
----------
flag@7818: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@7819: 0x00
cols@7820: 2
col 0[2] @7821: 1
col 1[6] @7824: ABCDEF
BBED> dump /v
File: /u01/app/oracle11g/oradata/test/users01.dbf (4)
Block: 629 Offsets: 7818 to 8191 Dba:0x01000275
---------------------------------------------------------------------------------------------------------------------------------------------------------------
2c000202 c1020641 42434445 462c0002 02c10306 41424344 45462c00 0202c104 06414243 4445462c 000202c1 05064142 l ,......ABCDEF,......ABCDEF,......ABCDEF,......AB
43444546 2c000202 c1060641 42434445 462c0002 02c10706 41424344 45462c00 0202c108 06414243 4445462c 000202c1 l CDEF,......ABCDEF,......ABCDEF,......ABCDEF,....
09064142 43444546 2c000202 c10a0641 42434445 462c0002 02c10b06 41424344 45462c02 0202c10b 04313233 342c0202 l ..ABCDEF,......ABCDEF,......ABCDEF,......1234,..
02c10a04 31323334 2c020202 c1090431 3233342c 020202c1 08043132 33342c02 0202c107 04313233 342c0202 02c10604 l ....1234,......1234,......1234,......1234,......
31323334 2c020202 c1050431 3233342c 020202c1 04043132 33342c02 0202c103 04313233 342c0202 02c10204 31323334 l 1234,......1234,......1234,......1234,......1234
2c000202 c1020641 42434445 462c0002 02c10306 41424344 45462c00 0202c104 06414243 4445462c 000202c1 05064142 l ,......ABCDEF,......ABCDEF,......ABCDEF,......AB
43444546 2c000202 c1060641 42434445 462c0002 02c10706 41424344 45462c00 0202c108 06414243 4445462c 000202c1 l CDEF,......ABCDEF,......ABCDEF,......ABCDEF,....
09064142 43444546 2c000202 c10a0641 42434445 462c0002 02c10b06 41424344 45460a06 016f l ..ABCDEF,......ABCDEF,......ABCDEF...o
--看注意看里面1234,如果对比前面的应用,可以发现,update没有提交的信息已经写入磁盘,然后rollback回滚,另外*kdbr[0]的内容,与前面看到的不同,那是回滚后的位置.
BBED> set offset 8058 OFFSET 8058
BBED> x /rnc
rowdata[240] @8058
------------
flag@8058: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8059: 0x00
cols@8060: 2
col 0[2] @8061: 1
col 1[6] @8064: ABCDEF
BBED> set offset 7948
OFFSET 7948
BBED> x /rnc
rowdata[130] @7948
------------
flag@7948: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@7949: 0x02
cols@7950: 2
col 0[2] @7951: 10
col 1[4] @7954: 1234
--附录checkpoint.sql脚本:
$ cat sqllaji/checkpoint.sql
column low_rba format a20
column on_disk_rba format a20
column rtckp_rba format a20
column diff_date format 999999.99
column CPOSD_ono_disk_rba_scn format 99999999999999999999999999999999
column cpdrt heading "检查点队列|脏块数量|CPDRT"
column cpodt_on_disk_rba heading "检查点队列|on disk rba|时间戳|CPODT"
column cpods heading "检查点队列|on disk rba scn|CPODS"
column cphbt heading "检查点心跳|CPHBT"
column current_sysdate heading "当前时间|SYSDATE"
set num 12
PROMPT
PROMPT REDO:
PROMPT
SELECT cpdrt ,
cplrba_seq || '.' || cplrba_bno || '.' || cplrba_bof "low_rba",
cpodr_seq || '.' || cpodr_bno || '.' || cpodr_bof "on_disk_rba",
TO_DATE (CPODT, 'MM-DD-YYYY HH24:MI:SS') cpodt_on_disk_rba,
SYSDATE current_sysdate,
ROUND ( (SYSDATE - TO_DATE (CPODT, 'MM-DD-YYYY HH24:MI:SS')) * 86400,
2)
diff_date,
CPODS ,
CPHBT,
current_scn,
current_scn - cpods diff_scn,
indx
FROM x$kcccp, v$database
WHERE CPLRBA_SEQ 0;
PROMPT
PROMPT DATAFILE:
PROMPT
SELECT rtckp_rba_seq || '.' || rtckp_rba_bno || '.' || rtckp_rba_bof
"rtckp_rba",
rtckp_scn,
current_scn,
current_scn - rtckp_scn diff_scn,
TO_DATE (rtckp_tim, 'MM-DD-YYYY HH24:MI:SS') rtckp_tim,
SYSDATE,
ROUND (
(SYSDATE - TO_DATE (rtckp_tim, 'MM-DD-YYYY HH24:MI:SS')) * 86400,
2) diff_date
FROM x$kccrt, v$database;
PROMPT
PROMPT v$instance_recovery:
PROMPT
SELECT inst_id,
actual_redo_blks,
target_redo_blks,
LOG_FILE_SIZE_REDO_BLKS AS "90%_blks",
LOG_CHKPT_TIMEOUT_REDO_BLKS AS timeout_blks,
target_mttr,
estimated_mttr
FROM gv$instance_recovery;