最近测试环境需要做一些变更,把测试环境切分成两套环境,存储空间也需要压缩压缩和整理。
unix组的人已经开始做空间划分了,然后我们需要在此基础上重建一套环境。
有些数据文件使用空间不大,所以准备压缩一下。
用了下面的sql语句,结果跑了十几秒中就抛了下面的错误。
SQL> set linesize 200
SQL> col name for a40
SQL> col resizecmd for a80
SQL> select a.file#,a.name,a.bytes/1024/1024 CurrentMB,
2 ceil(HWM * a.block_size)/1024/1024 ResizeTo,
3 (a.bytes - HWM * a.block_size)/1024/1024 ReleaseMB,
4 'alter database datafile '''||a.name||''' resize '||
5 ceil(HWM * a.block_size/1024/1024) || 'M;' ResizeCMD
6 from v$datafile a,
7 (select file_id,max(block_id+blocks-1) HWM
8 from dba_extents
9 group by file_id) b
1 0 where a.file# = b.file_id(+)
11 and (a.bytes - HWM *block_size)>0
order by 5 12 ;
order by 5
*
ERROR at line 12:
ORA-01116: error in opening database file 4
ORA-01110: data file 4: '/testPT1/oracle/TEST01/oratmp01/temp/ggs_data01.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
再一查看数据库进程,都没有了,看来数据库是宕了。
我还想这条sql语句真是厉害,看看日志里面怎么说。
Tue Mar 25 22:04:19 2014
End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK"
Wed Mar 26 02:00:00 2014
Closing scheduler window
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter
Mon Mar 31 10:14:55 2014
USER (ospid: 21846): terminating the instance due to error 472
Instance terminated by USER, pid = 21846
先把库重启了,看有没有什么问题。
SQL> startup nomount
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Total System Global Area 1.2827E+10 bytes
Fixed Size 2253880 bytes
Variable Size 4211084232 bytes
Database Buffers 8589934592 bytes
Redo Buffers 24096768 bytes
SQL> alter database mount;
Database altered.
当准备Open的时候,报了下面的错误。说有个数据文件丢失了。
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4:
'/testPT1/oracle/TEST01/oratmp01/temp/ggs_data01.dbf'
查看备份情况,是否有热备份之类的操作遗留。可以看到这个文件确实是损坏了或者被认为删除了。
SQL> select *from v$backup;
FILE# STATUS CHANGE# TIME
---------- ------------------ ---------- ---------
1 NOT ACTIVE 1.0583E+13 22-JAN-13
2 NOT ACTIVE 1.0583E+13 22-JAN-13
3 NOT ACTIVE 1.0583E+13 22-JAN-13
4 FILE NOT FOUND 0
...
40 NOT ACTIVE 1.0583E+13 22-JAN-13
查看文件的路径。
1* select file#,name from v$datafile
4
/testPT1/oracle/TEST01/oratmp01/temp/ggs_data01.dbf
确认了下这个数据文件goldengate用,现在测试环境上还没有goldengate,可以删除,于是头脑一发热,就准备马上删了,结果报了错。
SQL> drop tablespace GGS_DATA including contents and datafiles cascade constraint;
drop tablespace GGS_DATA including contents and datafiles cascade constraint
*
ERROR at line 1:
ORA-01109: database not open
才反应过来数据库还在mount状态
先把数据文件offline了
SQL> alter database datafile '/testPT1/oracle/TEST01/oratmp01/temp/ggs_data01.dbf' offline;
Database altered.
这时候再查看v$backup,那条记录就不复存在了。
FILE# STATUS CHANGE# TIME
---------- ------------------ ---------- ---------
1 NOT ACTIVE 1.0583E+13 22-JAN-13
2 NOT ACTIVE 1.0583E+13 22-JAN-13
3 NOT ACTIVE 1.0583E+13 22-JAN-13
5 NOT ACTIVE 1.0583E+13 22-JAN-13
6 NOT ACTIVE 1.0583E+13 22-JAN-13
7 NOT ACTIVE 1.0583E+13 22-JAN-13
8 NOT ACTIVE 1.0583E+13 22-JAN-13
9 NOT ACTIVE 1.0583E+13 22-JAN-13
10 NOT ACTIVE 1.0583E+13 22-JAN-13
11 NOT ACTIVE 1.0583E+13 22-JAN-13
12 NOT ACTIVE 1.0583E+13 22-JAN-13
....
39 rows selected.
把数据库open起来。
SQL> alter database open;
Database altered.
然后再删除,其实这个阶段也只是释放了句柄和更新了数据字典。
SQL> drop tablespace ggs_data including contents and datafiles cascade constraint;
Tablespace dropped.
再次查看数据库进程是否正常。可以看到数据库已经正常了。
test01@ccbdbpt4:/opt/app/oracle/TEST01> ps -ef|grep smon
test018420 5954 0 10:39 pts/2 00:00:00 grep smon
test0110295 1 0 10:18 ? 00:00:01 ora_smon_TEST01
test01@ccbdbpt4:/opt/app/oracle/TEST01> sqlplus / as sysdba
SQL*Plus: Release 11.2.0.2.0 Production on Mon Mar 31 10:40:08 2014
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> show user
USER is "SYS"
SQL> select status from v$instance;
STATUS
------------
OPEN
我在另一个环境上碰到了类似的问题。
test01@ccbdbpt4:/opt/app/oracle/TEST01/test> sqlplus / as sysdba
SQL*Plus: Release 11.2.0.2.0 Production on Thu Apr 3 17:26:43 2014
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL>
SQL> select dbms_metadata.get_ddl('USER',u.username) from dba_users u WHERE USER NAME in('TEST');
ERROR:
ORA-03113: end-of-file on communication channel
Process ID: 4056
Session ID: 4159 Serial number: 15
no rows selected
查看日志,错误还是类似。
Mon Mar 31 11:47:47 2014
USER (ospid: 26945): terminating the instance due to error 472
Instance terminated by USER, pid = 26945
这个问题的总结如下
导致数据库宕机的原因不是因为sql语句,而是因为unix组做文件的操作中,导致数据库进程问题,数据库其实已经停了,但是我们仍然可以进行简单的操作。因为操作系统中句柄还在。
那个数据文件的丢失和unix组的人确认是认为的失误,他们在做空间切分的时候,没有停库。