oracle中存储精简卷导致asm磁盘组异常

有朋友在一个存储空间给asm使用,发生空间不足,然后使用另外一个存储中的lun给asm的data磁盘组增加asm disk,运行了大概1天之后,asm磁盘组直接dismount,数据库crash.然后就无法正常mount.包括这个存储上的几个其他磁盘组也无法正常mount.
数据库异常日志
Sun Oct 23 08:43:59 2016
SUCCESS: diskgroup DATA was dismounted
SUCCESS: diskgroup DATA was dismounted
Sun Oct 23 08:44:00 2016
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lmon_79128.trc:
ORA-00202: control file: '+DATA/orcl/controlfile/current.278.892363163'
ORA-15078: ASM diskgroup was forcibly dismounted
Sun Oct 23 08:44:00 2016
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lgwr_79174.trc:
ORA-00345: redo log write error block 15924 count 2
ORA-00312: online log 2 thread 1: '+DATA/orcl/onlinelog/group_2.274.892363167'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lgwr_79174.trc:
ORA-00202: control file: '+DATA/orcl/controlfile/current.278.892363163'
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lgwr_79174.trc:
ORA-00204: error in reading (block 1, # blocks 1) of control file
ORA-00202: control file: '+DATA/orcl/controlfile/current.278.892363163'
ORA-15078: ASM diskgroup was forcibly dismounted
Sun Oct 23 08:44:00 2016
LGWR (ospid: 79174): terminating the instance due to error 204
Sun Oct 23 08:44:00 2016
opiodr aborting process unknown ospid (79742) as a result of ORA-1092
Sun Oct 23 08:44:01 2016
ORA-1092 : opitsk aborting process
Sun Oct 23 08:44:01 2016
ORA-1092 : opitsk aborting process
System state dump requested by (instance=1, osid=79174 (LGWR)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_79118.trc
Instance terminated by LGWR, pid = 79174
很明显,数据库异常是由于asm diskgroup dismount,因此分析asm 日志

asm 日志
Sun Oct 23 07:00:31 2016
Time drift detected. Please check VKTM trace file for more details.
Sun Oct 23 08:43:55 2016
Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_8755.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 1048576
WARNING: Write Failed. group:1 disk:2 AU:1222738 offset:0 size:1048576
ERROR: failed to copy file +DATA.524, extent 15030
ERROR: ORA-15080 thrown in ARB0 for group number 1
Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_8755.trc:
ORA-15080: synchronous I/O operation to a disk failed
Sun Oct 23 08:43:55 2016
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 1/0xec689cdd (DATA)
NOTE: ASM did background COD recovery for group 1/0xec689cdd (DATA)
NOTE: starting rebalance of group 1/0xec689cdd (DATA) at power 1
Starting background process ARB0
Sun Oct 23 08:43:56 2016
ARB0 started with pid=24, OS id=103554
NOTE: assigning ARB0 to group 1/0xec689cdd (DATA) with 1 parallel I/O
Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_103554.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 1048576
WARNING: Write Failed. group:1 disk:2 AU:1222738 offset:0 size:1048576
ERROR: failed to copy file +DATA.256, extent 6570
ERROR: ORA-15080 thrown in ARB0 for group number 1
Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_103554.trc:
ORA-15080: synchronous I/O operation to a disk failed
NOTE: stopping process ARB0
Sun Oct 23 08:43:58 2016
Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_dbw0_8521.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 4096
WARNING: Write Failed. group:1 disk:3 AU:6789 offset:24576 size:4096
NOTE: cache initiating offline of disk 3 group DATA
NOTE: process _dbw0_+asm1 (8521) initiating offline of disk 3.3915934787 (DATA_0003) with mask 0x7e in group 1
Sun Oct 23 08:43:58 2016
WARNING: Disk 3 (DATA_0003) in group 1 mode 0x7f is now being offlined
WARNING: Disk 3 (DATA_0003) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
NOTE: initiating PST update: grp = 1, dsk = 3/0xe9686c43, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 14 for pid 14, osid 8521
ERROR: Disk 3 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Sun Oct 23 08:43:58 2016
NOTE: cache dismounting (not clean) group 1/0xEC689CDD (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 103577, image: oracle@node1 (B000)
WARNING: Disk 3 (DATA_0003) in group 1 mode 0x7f offline is being aborted
WARNING: Offline of disk 3 (DATA_0003) in group 1 and mode 0x7f failed on ASM inst 1
NOTE: halting all I/Os to diskgroup 1 (DATA)
Sun Oct 23 08:43:59 2016
NOTE: LGWR doing non-clean dismount of group 1 (DATA)
NOTE: LGWR sync ABA=160.10145 last written ABA 160.10145
错误信息很明显,由于Write Failed导致asm diskgroup dismount.

系统日志
Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd]  Sense Key : Data Protect [current]
Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd]  Add. Sense: Space allocation failed write protect
Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd] CDB: Write(16): 8a 00 00 00 00 02 e7 18 37 f9 00 00 00 07 00 00
Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev sdd, sector 12467058681
Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467058681
Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh]  Sense Key : Data Protect [current]
Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh]  Add. Sense: Space allocation failed write protect
Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh] CDB: Write(16): 8a 00 00 00 00 02 e7 18
Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb]  Sense Key : Data Protect [current]
Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb]   33Add. Sense: Space allocation failed write protect
Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb] CDB: Write(16): 8a 00 00 00 00 02 e7 18 30 00 00 00 03 f9 00 00
Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev sdb, sector 12467056640
Oct 23 08:43:55 node1 kernel: f9 00 00 04 00
Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467056640
Oct 23 08:43:55 node1 kernel: 00 00
Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev sdh, sector 12467057657
Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467057657
Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh]  Sense Key : Data Protect [current]
Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh]  Add. Sense: Space allocation failed write protect
Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh] CDB: Write(16): 8a 00 00 00 00 02 e7 18 37 f9 00 00 00 07 00 00
Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev sdh, sector 12467058681
Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467058681
Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj]  Sense Key : Data Protect [current]
Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj]  Add. Sense: Space allocation failed write protect
Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj] CDB: Write(16): 8a 00 00 00 00 02 e7 18 30 00 00 00 03 f9 00 00
Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev sdj, sector 12467056640
Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467056640
Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb]  Sense Key : Data Protect [current]
Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb]  Add. Sense: Space allocation failed write protect
Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb] CDB: Write(16): 8a 00 00 00 00 02 e7 18 33 f9 00 00 04 00 00 00
Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb]  Sense Key : Data Protect [current]
Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb]  Add. Sense: Space allocation failed write protect
Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb] CDB: Write(16): 8a 00 00 00 00 03 3b 7e 78 30 00 00 00 08 00 00
Oct 23 10:50:59 node1 init: oracle-ohasd main process (6150) killed by TERM signal
错误信息为:critical space allocation error,严重空间分配错误.也就是linux在分配空间之时发生错误.在换而言之,由于分配空间错误导致asm 磁盘组dismount.

查看多路径信息
[root@node1 ~]# multipath -ll
36000d31003190c000000000000000003 dm-3 COMPELNT,Compellent Vol
size=80T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 6:0:9:1  sdd 8:48  active ready running
  `- 8:0:9:1  sdi 8:128 active ready running
delldisk2 (36000d310031908000000000000000003) dm-4 COMPELNT,Compellent Vol
size=8.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 6:0:12:1 sde 8:64  active ready running
  |- 8:0:6:1  sdh 8:112 active ready running
  |- 6:0:4:1  sdb 8:16  active ready running
  `- 8:0:12:1 sdj 8:144 active ready running
delldisk1 (36000d31003190a000000000000000007) dm-2 COMPELNT,Compellent Vol
size=12T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 6:0:1:1  sda 8:0   active ready running
  |- 8:0:2:1  sdf 8:80  active ready running
  |- 6:0:7:1  sdc 8:32  active ready running
  `- 8:0:3:1  sdg 8:96  active ready running
很明显报错的都是同一个lun(delldisk2),也就是存储空间使用完的存储.也就是说,由于delldisk2存储的空间使用尽了导致系统出现分配空间错误,从而导致asm 写失败,进而导致数据库异常.这种问题的本质其实就是存储给系统分配了8T,但是实际存储可以使用的空间不足8T,而os按照8T来使用从而出现该问题.专业名字叫做”存储精简卷”.因此各位在存储配置之时需要注意该问题.因为这种情况的出现一般只是写io异常,读依旧正常,因此不会丢失数据.

时间: 2024-08-03 11:45:56

oracle中存储精简卷导致asm磁盘组异常的相关文章

Oracle 11gR2 RAC DBCA无法识别ASM磁盘组(ORA-19504,ORA-15045,ORA-17502,ORA-15081)

ORA-19504: failed to create file "+DATA" ORA-15045: ASM file name '+DATA' is not in reference form ORA-17502: ksfdcre:5 Failed to create file +DATA ORA-15081: failed to submit an I/O operation to a disk   在DBCA过程中,发现无法识别ASM磁盘组报以上错误,解决方法如下: 两个节点分

oracle中分区无法识别导致asm diskgroup无法mount问题解决办法

有客户咨询由于主机重启之后,导致四个磁盘组中的data2磁盘组无法mount(报ORA-15032,ORA-15017,ORA-15063),数据库无法open,让我们帮忙分析解决 Wed Mar 09 18:10:53 2016 NOTE: Assigning number (1,1) to disk (/dev/oracleasm/disks/VOL011) Wed Mar 09 18:10:53 2016 ERROR: no read quorum in group: required 1

oracle 11g ASM 磁盘组在线扩容实验:

oracle 11g ASM磁盘组扩容实验: 该主机为我经常用的测试机,因为为了顺便学习ASM,底层存储使用的是ASM的方式,但是以前规划的时候磁盘组只有4G,由于数据文件增加,导致没有可用的空间.所以,模拟生产环境做了ASM在线扩容的实验. 建议在做之前,对数据库进行备份. ---------------------------------------1.主机和数据库环境--------------------------------------- SQL> select * from v$v

oracle 11g asm 磁盘组兼容属性

   从oracle 11g版本开始,asm磁盘组会有两个新属性来兼容asm版本和数据库版本 COMPATIBLE.ASM - The minimum version of the ASM software that can access the disk group. In 11g, the default setting is 10.1. COMPATIBLE.RDBMS - The minimum COMPATIBLE database initialization parameter s

ASM 翻译系列第三十四弹:ASM磁盘组重要属性介绍

ASM Disk Group Attributes 磁盘组的属性是ASM 11.1版本引入的,是磁盘组层面而非ASM实例层面的.磁盘组的属性有一些只能在创建磁盘组时指定,有一些只能在创建之后指定,还有一些可以在任何时候指定. 本篇内容是对本系列文章-[ASM Attributes Directory]的展开. ACCESS_CONTROL.ENABLED ACCESS_CONTROL.ENABLED属性指定了一个磁盘组的ASM File Access Control是否启用,参数的值可以设置为t

【RAC】 RAC For W2K8R2 安装--创建ASM磁盘组(六)

[RAC] RAC For W2K8R2 安装--创建ASM磁盘组(六) 一.1  BLOG文档结构图       一.2  前言部分   一.2.1  导读 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① RAC for windows 2008R2 的安装 ② rac环境下共享存储的规划和搭建 ③ starwind软件的应用 ④ VMware workstation 如何做共享存储 ⑤ rac数据的静默安装和dbca静默建库 ⑥

oracle 11g rac asm磁盘组增加硬盘

要增加磁盘的磁盘组为:DATA 要增加的磁盘为: /dev/sde1 在第一个节点上:[root@rac1 ~]# fdisk /dev/sdeDevice contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabelBuilding a new DOS disklabel. Changes will remain in memory only,until you decide to write them.

oracle中配置客户端连接到ASM实例

对于Oracle 网络配置,我们通常通过negmgr或者netca来完成客户端连接到数据库实例.而对于连接到ASM实例,同样可以实现从客户端来进行连接.不过Oracle并未为我们提供工具来完成配置,我们可以通过手动配置监听以及客户端tnsnames来实现.本文对此给出描述与示例. 1.服务器端.客户端的环境    #服务器端环境,host信息    oracle@bo2dbp:~> cat /etc/hosts |grep vip    192.168.7.61   bo2dbp-vip.2go

Oracle的ASM磁盘组管理概述

一.创建磁盘组 命令: create diskgroup diskgroup_name <high|normal|external> redundancy failgroup failgroup_name disk 'disk_nmae' name asm_disk_name - ; 说明: failgroup(故障组)的个数的限制: HIGH  ---------- failgroup >= 3 NORMAL ---------- failgroup >= 2 EXTERNAL