oracle库存储问题导致数据库重启后无法正常启动恢复教程

某客户的核心数据库存储问题导致数据库重启后无法正常启动,根据客户反馈最开始在启动数据库时
报错控制文件IO错误,如下:

Sun Mar 15 11:59:37 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_arc1_630876.trc:
ORA-00204: error in reading (block 1, # blocks 1) of control file
ORA-00202: control file: '/xxx/xxxx/control01.ctl'
ORA-17500: ODM err:ODM ERROR V-41-4-2-43-6 No such device or address
Sun Mar 15 11:59:37 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_arc1_630876.trc:
ORA-00204: error in reading (block 1, # blocks 1) of control file
ORA-00202: control file: '/xxx/xxxx/control01.ctl'
ORA-17500: ODM err:ODM ERROR V-41-4-2-43-6 No such device or address
Sun Mar 15 11:59:37 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_arc1_630876.trc:
ORA-00204: error in reading (block 1, # blocks 1) of control file
ORA-00202: control file: '/xxx/xxxx/control01.ctl'
ORA-17500: ODM err:ODM ERROR V-41-4-2-43-6 No such device or address
Sun Mar 15 11:59:37 2015
Master background archival failure: 204
Sun Mar 15 11:59:49 2015
Termination issued to instance processes. Waiting for the processes to exit
Sun Mar 15 15:40:09 2015
Starting ORACLE instance (normal)
上述的问题本质上都跟控制文件有关系,替换掉损坏的控制文件就行。当替换掉控制文件之后,在open数据库时发现报如下错误:

Sun Mar 15 16:10:48 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_626734.trc:
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x70000038F8C94E0], [2], [], [], [], [], []
Abort recovery for domain 0
Sun Mar 15 16:10:49 2015
Aborting crash recovery due to error 600
Sun Mar 15 16:10:49 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_626734.trc:
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x70000038F8C94E0], [2], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...
Sun Mar 15 16:10:49 2015
Trace dumping is performing id=[cdmp_20150315161049]
Sun Mar 15 16:12:35 2015
Shutting down instance: further logons disabled
Sun Mar 15 16:12:35 2015
该错误本质上是因为redo的问题,即有redo log损坏。通过在RMAN进行recover,发现报如下类似错误:

Sun Mar 15 16:47:59 2015
Beginning crash recovery of 2 threads
 parallel recovery setup failed: using serial mode
Sun Mar 15 16:47:59 2015
Started redo scan
Sun Mar 15 16:47:59 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_299470.trc:
ORA-00313: open failed for members of log group 5 of thread 2
ORA-00312: online log 5 thread 2: '/xxx/xxxx/redo05a.log'
ORA-17503: ksfdopn:4 Failed to open file /xxx/xxxx/redo05a.log
ORA-17500: ODM err:File does not exist
Sun Mar 15 16:47:59 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_299470.trc:
ORA-00313: open failed for members of log group 4 of thread 1
ORA-00312: online log 4 thread 1: '/xxx/xxxx/redo04a.log'
ORA-17503: ksfdopn:4 Failed to open file /xxx/xxxx/redo04a.log
ORA-17500: ODM err:File does not exist
Sun Mar 15 17:03:03 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_299470.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 2009344 change 14160745159583 time 03/15/2015 11:56:29
ORA-00334: archived log: '/xxx/xxxx/redo04b.log'
Sun Mar 15 17:03:03 2015
Abort recovery for domain 0
Sun Mar 15 17:03:03 2015
Aborting crash recovery due to error 354
Sun Mar 15 17:03:03 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_299470.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 2009344 change 14160745159583 time 03/15/2015 11:56:29
ORA-00312: online log 4 thread 1: '/xxx/xxxx/redo04b.log'
ORA-354 signalled during: ALTER DATABASE OPEN...
Sun Mar 15 17:08:02 2015
上述过程大致是客户之前的处理过程。我在18点左右介入之后,进行了相关的操作。我最开始尝试在利用RMAN 进行恢复,发现报错:

RMAN> recover database;
 
Starting recover at 15-MAR-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=3268 instance=xxxx2 devtype=DISK
 
starting media recovery
 
media recovery failed
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 03/15/2015 18:39:26
ORA-00283: recovery session canceled due to errors
RMAN-11003: failure during parse/execution of SQL statement: alter database recover if needed
 start
ORA-00283: recovery session canceled due to errors
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 1788672 change 14160744248478 time 03/15/2015 11:54:46
ORA-00312: online log 4 thread 1: '/xxx/xxxx/redo04a.log'
从上面的错误来看,初步可以判断redo04a.log文件已经损坏,而且是block 1788672的问题。为了验证该block是否损坏,我通过类似如下的dump 命令进行dump,发现报错:

alter system dump logfile 'xxx' scn min xxxx scn max xxxx;
由此判断,该block损坏无疑。 由于客户的需求是尽可能快的将数据库拉起来,因此对应redo损坏的情况之下。
通常只能进程不完全恢复并强制打开,这里我使用了如下的参数:

*._allow_resetlogs_corruption=TRUE
*._allow_error_simulation=TRUE
在open resetlogs之前,我已经将redo备份,resetlogs打开时,发现数据库报错如下:

Sun Mar 15 19:43:36 2015
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Mar 15 19:43:36 2015
SMON: enabling cache recovery
Sun Mar 15 19:43:37 2015
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Sun Mar 15 19:43:37 2015
ORA-01555 caused by SQL statement below (SQL ID: 5wc2915k44m38, Query Duration=0 sec, SCN: 0x0ce1.0e2d8971):
Sun Mar 15 19:43:37 2015
select user#,type# from user$ where name=:1
Sun Mar 15 19:43:37 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_667814.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 25 with name "_SYSSMU25$" too small
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 667814
ORA-1092 signalled during: alter database open resetlogs...
从日志来看,大致判断可能是_SYSSMU25$ 回滚段的问题,因此尝试先通过如下隐含参数屏蔽回滚段:

_corrupted_rollback_segments=_SYSSMU25$
_offline_rollback_segments=_SYSSMU25$
屏蔽回滚段之后,尝试打开数据库,发现错误依旧,通过10046 trace跟踪,发现递归SQL在如下的block上执行失败:

PARSING IN CURSOR #3 len=43 dep=1 uid=0 oct=3 lid=0 tim=37951056727245 hv=1682066536 ad='8cb74a90'
select user#,type# from user$ where name=:1
END OF STMT
PARSE #3:c=0,e=372,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=37951056727243
BINDS #3:
kkscoacd
 Bind#0
 oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
 oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0
 kxsbbbfp=1126d4b70  bln=32  avl=03  flg=05
 value="XDB"
EXEC #3:c=0,e=465,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=37951056727780
WAIT #3: nam='db file sequential read' ela= 1582 file#=1 block#=282 blocks=1 obj#=44 tim=37951056729421
WAIT #3: nam='db file sequential read' ela= 6642 file#=1 block#=91 blocks=1 obj#=22 tim=37951056736126
通过dump file 1 block 91,发现该block上第2个ITL确认存在一个活跃事务。原本计划直接bbed提交该事务,但是当我编译好bbed之后,查看发现该block为一个cluster block.
对于cluster block的事务修改,相对复杂一些,我的博客有文章描述,大家可以参考,这里不多说。考虑到生产库使用bbed有一定的风险,我并没有使用bbed。
接着使用undo_management参数启动数据库,然后强制open数据库,发现错误变成如下:

Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_778430.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [3297], [238525189], [3297], [238091117], [], [], []
Sun Mar 15 20:50:52 2015
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Sun Mar 15 20:50:53 2015
Redo thread 1 internally disabled at seq 1 (CKPT)
Sun Mar 15 20:50:53 2015
ARC1: Archiving disabled thread 1 sequence 1
Sun Mar 15 20:50:54 2015
Trace dumping is performing id=[cdmp_20150315205054]
Sun Mar 15 20:50:54 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_778430.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [3297], [238525189], [3297], [238091117], [], [], []
Sun Mar 15 20:50:54 2015
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 778430
ORA-1092 signalled during: alter database open  resetlogs...
从错误来看,我们就可以知道,这应该是SCN的问题。如果要手工推进SCN,那么level应该待遇3297*4才行,由于这里的238091117/1024/1024/1024小于1,因此推进scn时,level=3297*4+2 就差不多了。 这里我再次进行了10046 trace,发现了如下信息:

WAIT #5: nam='db file sequential read' ela= 1021 file#=1 block#=400 blocks=1 obj#=0 tim=37953716489772
EXEC #5:c=0,e=2969,p=1,cr=2,cu=3,mis=1,r=1,dep=1,og=4,tim=37953716490098
STAT #5 id=1 cnt=1 pid=0 pos=1 obj=0 op='UPDATE  UNDO$ (cr=2 pr=1 pw=0 time=1542 us)'
STAT #5 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=2 pr=0 pw=0 time=11 us)'
WAIT #1: nam='row cache lock' ela= 71 cache id=3 mode=0 request=5 obj#=0 tim=37953716490369
WAIT #1: nam='db file sequential read' ela= 5783 file#=2 block#=25 blocks=1 obj#=0 tim=37953716496201
........
........
GLOBAL CACHE ELEMENT DUMP (address: 700000011f91498):
 id1: 0x19 id2: 0x20000 obj: US#2 block: (2/25)
 lock: SL rls: 0x0000 acq: 0x0000 latch: 3
 flags: 0x41 fair: 0 recovery: 0 fpin: 'ktuwh02: ktugus'
 bscn: 0x0.0 bctx: 0 write: 0 scan: 0x0 xflg: 0 xid: 0x0.0.0
 lcp: 0 lnk: [NULL] lch: [70000023bf4bc20,70000023bf4bc20]
 seq: 3 hist: 143:0 208 352
 LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:
 flg: 0x00080000 state: READING mode: EXCL
 pin: 'ktuwh02: ktugus'
 addr: 70000023bf4bb10 obj: INVALID cls: UNDO HEAD bscn: 0xce1.e379b05  ---这里的bscn即scn值
 GCS SHADOW 700000011f91508,1 sq[70000035fb339f8,70000035fb339f8] resp[70000035fb339d0,0x19.20000] pkey 4294950914
 grant 1 cvt 0 mdrole 0x21 st 0x40 GRANTQ rl LOCAL
 master 1 owner 1 sid 0 remote[0,0] hist 0x106
 history 0x6.0x4.0x0.0x0.0x0.0x0. cflag 0x0 sender 0 flags 0x0 replay# 0
 disk: 0x0000.00000000 write request: 0x0000.00000000
 pi scn: 0x0000.00000000
msgseq 0x0 updseq 0x0 reqids[1,0,0] infop 0x0
 GCS RESOURCE 70000035fb339d0 hashq [70000038cbc6658,70000038cbc6658] name[0x19.20000] pkey 4294950914
 grant 700000011f91508 cvt 0 send 0,0 write 0,0@65535
 flag 0x0 mdrole 0x1 mode 1 scan 0 role LOCAL
 disk: 0x0000.00000000 write: 0x0000.00000000 cnt 0x0 hist 0x0
 xid 0x0000.000.00000000 sid 0 pkwait 59s
 pkey 4294950914
 hv 0 [stat 0x0, 1->1, wm 32767, RMno 0, remxxx 0, dom 0]
 kjga st 0x4, step 0.0.0, cxxx 2, rmno 0, flags 0x0
 lb 0, hb 0, myb 6147, drmb 6147, apifrz 0
 GCS SHADOW 700000011f91508,1 sq[70000035fb339f8,70000035fb339f8] resp[70000035fb339d0,0x19.20000] pkey 4294950914
 grant 1 cvt 0 mdrole 0x21 st 0x40 GRANTQ rl LOCAL
 master 1 owner 1 sid 0 remote[0,0] hist 0x106
 history 0x6.0x4.0x0.0x0.0x0.0x0. cflag 0x0 sender 0 flags 0x0 replay# 0
 disk: 0x0000.00000000 write request: 0x0000.00000000
 pi scn: 0x0000.00000000
msgseq 0x0 updseq 0x0 reqids[1,0,0] infop 0x0
kjbmbassert [0x19.20000]
*** 2015-03-15 20:54:54.385
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kclchkblk_4], [3297], [238525189], [3297], [238091117], [], [], []
Current SQL statement for this session:
alter database open resetlogs
bscn: 0xce1.e379b05 将该scn进行转换,我们可以发现:0xce1 为3297,e379b05为238525189. 与上述报错信息一致。同时我发现这里使用了第2号回滚段,如下:

id1: 0x19 id2: 0x20000 obj: US#2 block: (2/25)
因此,尝试继续使用隐含参数屏蔽这第2号回滚段,并尝试打开数据库,但是错误依旧。看来还是需要调整SCN才行,如下:

Sun Mar 15 21:23:20 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_774222.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [3297], [238958669], [3297], [238091118], [], [], []
Sun Mar 15 21:23:20 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_774222.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
ORA-00600: internal error code, arguments: [kclchkblk_4], [3297], [238958669], [3297], [238091118], [], [], []
Sun Mar 15 21:23:21 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_774222.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [3297], [238958669], [3297], [238091118], [], [], []
Sun Mar 15 21:23:21 2015
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 774222
首先我尝试了在会话级别设置:
?
1
alter session set events '10015 trace name adjust_scn level 13190';
发现alter database open失败,尝试使用*._minimum_giga_scn参数,但是在启动的时候,提示说该参数不支持。从此判断,该环境可能是安装了比较新的PSU,Oracle将该参数废弃掉了,这么说前面的10015 event根本就没起作用。 无奈只能通过oradebug手工修改SCN来启动数据库了,如下:

SQL> startup mount pfile='/tmp/gb.ora';
ORACLE instance started.
 
Total System Global Area 1.5032E+10 bytes
Fixed Size                  2110096 bytes
Variable Size            5704256880 bytes
Database Buffers         9311354880 bytes
Redo Buffers               14663680 bytes
Database mounted.
SQL> oradebug setmypid
Statement processed.
SQL> oradebug DUMPvar SGA kcsgscn_
kcslf kcsgscn_ [7000000100122A8, 7000000100122D8) = 00000000 00000005 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 07000000 ...
SQL>
SQL> oradebug poke 0x7000000100122A8 4 3300
BEFORE: [7000000100122A8, 7000000100122AC) = 00000000
AFTER:  [7000000100122A8, 7000000100122AC) = 00000CE4
SQL> oradebug DUMPvar SGA kcsgscn_
kcslf kcsgscn_ [7000000100122A8, 7000000100122D8) = 00000CE4 00000005 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 07000000 ...
SQL>
修改SCN之后,顺利打开了数据库,但是数据库很快就crash掉,如下是日志信息:

Sun Mar 15 21:47:31 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
......
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
Sun Mar 15 21:47:33 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], []
QMNC started with pid=32, OS id=520520
Sun Mar 15 21:47:35 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], []
Sun Mar 15 21:47:35 2015
ORACLE Instance xxxx2 (pid = 22) - Error 600 encountered while recovering transaction (44, 26) on object 47098.
Sun Mar 15 21:47:35 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], []
Sun Mar 15 21:47:36 2015
LOGSTDBY: Validating controlfile with logical metadata
Sun Mar 15 21:47:36 2015
LOGSTDBY: Validation complete
Sun Mar 15 21:47:36 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
Sun Mar 15 21:47:36 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
Sun Mar 15 21:47:36 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sun Mar 15 21:47:37 2015
ORACLE Instance xxxx2 (pid = 22) - Error 600 encountered while recovering transaction (48, 25).
Sun Mar 15 21:47:37 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sun Mar 15 21:47:39 2015
Completed: alter database open
Sun Mar 15 21:47:39 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
Sun Mar 15 21:47:39 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
Sun Mar 15 21:47:39 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sun Mar 15 21:47:40 2015
ORACLE Instance xxxx2 (pid = 22) - Error 600 encountered while recovering transaction (65, 7).
Sun Mar 15 21:47:40 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sun Mar 15 21:47:40 2015
Trace dumping is performing id=[cdmp_20150315214740]
Sun Mar 15 21:47:41 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j009_790778.trc:
ORA-12012: error on auto execute of job 524
ORA-01552: cannot use system rollback segment for non-system tablespace 'xxx_ADMIN'
ORA-06512: at "SYS.xxx_LOGINHISTORY", line 3
ORA-06512: at line 1
Sun Mar 15 21:47:41 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j009_790778.trc:
ORA-12012: error on auto execute of job 524
ORA-01552: cannot use system rollback segment for non-system tablespace 'xxx_ADMIN'
ORA-06512: at "SYS.xxx_LOGINHISTORY", line 3
ORA-06512: at line 1
Sun Mar 15 21:47:41 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j002_475534.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Sun Mar 15 21:47:41 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
Sun Mar 15 21:47:41 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
Sun Mar 15 21:47:41 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sun Mar 15 21:47:42 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j008_586068.trc:
ORA-12012: error on auto execute of job 526
ORA-01552: cannot use system rollback segment for non-system tablespace 'xxx_ADMIN'
ORA-06512: at "SYS.xxx_SEG_xxx", line 3
ORA-06512: at line 1
Sun Mar 15 21:47:42 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j009_790778.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
ORA-12012: error on auto execute of job 524
ORA-01552: cannot use system rollback segment for non-system tablespace 'xxx_ADMIN'
ORA-06512: at "SYS.xxx_LOGINHISTORY", line 3
ORA-06512: at line 1
Sun Mar 15 21:47:42 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j002_475534.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Sun Mar 15 21:47:43 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j009_790778.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
ORA-12012: error on auto execute of job 524
ORA-01552: cannot use system rollback segment for non-system tablespace 'XXXX_ADMIN'
ORA-06512: at "SYS.XXXX_LOGINHISTORY", line 3
ORA-06512: at line 1
Sun Mar 15 21:47:43 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j009_790778.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-08102: index key not found, obj# 239, file 1, block 1674 (2)
Sun Mar 15 21:47:43 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j002_475534.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Sun Mar 15 21:47:43 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j002_475534.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], []
ORA-06512: at "xxxx.PKG_XXXXX", line 126
ORA-06512: at line 3
Sun Mar 15 21:47:43 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_pmon_565700.trc:
ORA-00474: SMON process terminated with error
Sun Mar 15 21:47:43 2015
PMON: terminating instance due to error 474
Sun Mar 15 21:47:47 2015
Dump system state for local instance only
System State dumped to trace file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_diag_377122.trc
Sun Mar 15 21:47:48 2015
Instance terminated by PMON, pid = 565700
从上述日志信息来看,主要出现了如下几个错误:
ORA-00600 [6006],ORA-00600 [4137],ORA-00600 [kdsgrp1]
对于前面2个错误,很明显是Oracle SMON进程在进行利用回滚段进行事务rollback时失败导致,如下:
ORACLE Instance xxxx2 (pid = 22) – Error 600 encountered while recovering transaction (44, 26) on object 47098.
ORACLE Instance xxxx2 (pid = 22) – Error 600 encountered while recovering transaction (48, 25).
因此,不难看出,数据库中还有部分的回滚段存在活跃事务。
对于ORA-00600 [kdsgrp1]错误,通常是出现在Index上,即Index数据和表的数据不一致,一般来说可以通过重建解决。
其次,对于后面的ORA-08102: index key not found, obj# 239, file 1, block 1674 (2) 错误,主要是job调用出现,因此
我们可以暂时屏蔽job的调度。
对于ORA-08102错误,我的博客几年前也写过相关的文章,本质上也是Index block中的相关键值不存在导致。
与其如此,最后我感觉将数据库的所有回滚段都屏蔽掉,并重建数据库undo 表空间,如下是获取回滚段的命令:
strings system01.dbf | grep _SYSSMU | cut -d $ -f 1 | sort -u
经过整理,发现该库存在大约2600个回滚段,我了个去,先不管这么多,重启实例后,重建undo表空间:

SQL> conn /as sysdba
Connected to an idle instance.
SQL> startup upgrade pfile='/tmp/gb2.ora';
ORACLE instance started.
 
Total System Global Area 1.5032E+10 bytes
Fixed Size                  2110096 bytes
Variable Size            5704256880 bytes
Database Buffers         9311354880 bytes
Redo Buffers               14663680 bytes
Database mounted.
Database opened.
SQL> create undo tablespace undotbs11 datafile '/xxx/xxxx/undotbs11_01.dbf' size 100m;
 
Tablespace created.
 
SQL> create undo tablespace undotbs22 datafile '/xxx/xxxx/undotbs22_01.dbf' size 100m;
 
Tablespace created.
 
SQL> drop tablespace undotbs1 xxxluding contents and datafiles;
 
Tablespace dropped.
 
SQL> drop tablespace undotbs2 xxxluding contents and datafiles;
 
Tablespace dropped.

最后重启数据库实例,让客户将关键核心的配置表导出,先进行业务恢复,如果需要数据,直接从库中抽取。
这里要补充一点,该库约为5TB多一点,虽然有备份,但是恢复时间太长,如果有个dataguard是多么的重要啊

时间: 2024-10-30 16:10:07

oracle库存储问题导致数据库重启后无法正常启动恢复教程的相关文章

oracle数据库重启之后无法open的恢复案例

这是某网友的维护的一套数据库,据说是正常重启之后就无法启动数据库了.那么我们先来看看日志是什么样的: Errors in file /u01/app/oracle/admin/orcl/bdump/orcl1_p012_18165.trc: ORA-27090: Message 27090 not found;  product=RDBMS; facility=ORA Linux-x86_64 Error: 4: Interrupted system call Additional inform

oracle 执行计划改变导致数据库负载过高解决办法

数据库主机负载 这里明显表现系统load 偏高,而且还在上升中:top的进程中,占用cpu都计划100% top - 16:25:39 up 123 days,  1:42,  4 users,  load average: 46.19, 45.08, 43.93 Tasks: 1469 total,  28 running, 1439 sleeping,   0 stopped,   2 zombie Cpu(s): 45.9%us,  1.1%sy,  0.0%ni, 47.1%id,  5

MySQL数据库自动定时备份策略与恢复教程

注意:主从同步不能代替数据库备份 举例说明,如果我们在主数据库上误删了一个table,那么从数据库也会执行相应的操作,如果没有备份,这就会导致删掉的数据就无法找回.(即使回滚二进制日志也不一定能找回,因为二进制日志的保存是有时间限制的,并且回滚二进制日志非常复杂). 因此数据库的完全备份.增量备份是至关重要的. 备份策略 1. 每天进行数据库备份. mysqldump --master-data --single-transaction -R --databases [db1] [db2] [d

Oracle数据库重启后密码失效的问题(r12笔记第91天)

  前几天,我和系统运维的同事处理一个看似诡异的问题,他找到我说应用服务器启动的时候报了DB的Error,但是错误信息有限,他也没法完全定位到错误的原因,所以就希望我来帮忙看看这个问题是怎么回事,怎么解决.    从应用服务启动的日志来看,错误信息是连接池的地方有了问题. Error: 2017-06-09 10:04:59 init connpool:one or more conn open error. Error: 2017-06-09 10:12:50 init connpool:on

Oracle云存储“超售”导致服务能力无法兑现?

CRN本周表示,据多位可靠的消息人士透露,Oracle最近经历了一场云存储能力不足的危机,其基础架构即服务以及Exadata Database即服务的客户受到了波及. 虽然目前尚不清楚有多少客户受到了影响,也不知道这个问题是否已经得以解决,消息人士说有一些大客户依旧不能在五月底到期之前使用他们购买的"cloud credits".据了解,cloud credits是一种针对Oracle云服务的许可证,这种许可证正在越来越多地出现在大型交易之中. 一位不愿意透露姓名的Oracle合伙人表

电脑重装系统后数据丢失怎么免费恢复教程

当电脑出现重大故障无法解决时,机主都会选择重装系统的方式,重装系统时,大多数的人都会事先对电脑的文件进行备份,但是对于一些并不熟练电脑操作的人来说,常常是在电脑重装以后才发现之前的硬盘里的数据丢失,如果您真的遇到了重装系统文件没有备份导致数据丢失的问题,那么一些重要文件只能通过数据恢复软件找回来了. 下面就是电脑重装系统后数据丢失数据恢复图文教程: 1.下载得力数据恢复软件并进行安装. 2.打开得力数据恢复软件并选择恢复误删文件功能. 3.对误删文件的C盘进行扫描,扫描过程中软件会显示扫描出来误

Oracle RAC(4 TB ASM) 数据库恢复详细记录

6月底我们接到某客户的紧急支持请求,其客户数据库在不久前由于机房停电,导致数据库重启后无法启动. 我们通过teamviewer远程初步分析了alert log以及kfed读取了几个disk 发现,数据库无法启动的根本原因在于ASM diskgroup无法mount.而ASM diskgroup 无法mount的根本原因在于,ASM元数据出现损坏,其中表现为ASM 启动时无法进行事务恢复. 这里我们先不去纠结为什么会坏.对于asm的元数据如果出现损坏,那么修复的难度可想而知. 这里我采取了非常简单

oracle pl sql-oracle 自带工具创建数据库成功后 pl/sl 连接数据库选项中没有创建的数据库

问题描述 oracle 自带工具创建数据库成功后 pl/sl 连接数据库选项中没有创建的数据库 求大神解答,小弟刚学oracle 成功创建数据库后,用sqlplus可以连接.如下图: 但是用pl/sql database选项中没有创建的数据库名称 如下图: 解决方案 PLSQL连接64位oracle还需要安装32位的oracle客户端才行 可以按照下面配置http://jingyan.baidu.com/article/fb48e8be4c7c206e622e1491.html

路由器重启后上不了网怎么办?

  路由器重启后连不上原因: 1.更改了路由器设置,比如修改了Wifi无线密码.修改了宽带账号密码等等; 2.IP地址冲突; 3.更改了路由器DNS地址. 路由器重启后上不了网解决办法: 一.首先想一下,重启路由器前是否进行了某些路由器设置更改,比如更改了Wifi无线密码.宽带账号修改错了等等.如果是修改了Wifi无线密码,而电脑依旧记忆的是旧密码的话,那么就会出现电脑连不上无线网络,方法是在电脑中修改Wifi密码即可,具体方法如下: 1)在电脑连接的无线网络名称中,右键然后选择属性,如下图所示