数据库版本 11.2.0.3.15
操作系统 CentOS 6.5
文件系统 ext4
最近发现我们线上数据库的3台备库,全部出现一样的错误如下:
INCIDENT_ID PROBLEM_KEY CREATE_TIME
-------------------- ----------------------------------------------------------- ----------------------------------------
352249 ORA 445 2016-04-07 22:18:02.350000 +08:00
640297 ORA 600 [3020] 2016-11-08 20:09:44.612000 +08:00
640217 ORA 600 [ORA-00600: internal error code, arguments: [3020], 2016-11-08 20:10:19.945000 +08:00
640298 ORA 600 [3020] 2016-11-14 11:40:52.532000 +08:00
640177 ORA 600 [ORA-00600: internal error code, arguments: [3020], 2016-11-14 11:41:32.540000 +08:00
Dump continued from file: /home/oracle/database/diag/rdbms/yjfhisd/yjfhis/trace/yjfhis_mrp0_27052.trc
ORA-00600: internal error code, arguments: [3020], [3], [33188], [12616100], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 33188, file offset is 271876096 bytes)
ORA-10564: tablespace UNDOTBS1
ORA-01110: da
========= Dump for incident 640177 (ORA 600 [ORA-00600: internal error code, arguments: [3020], [3], [33188], [12616100], [], [], [], [], [], [], [], []
开始查询这个错误发现可能是BUG引起的,只能是重新进行备库配置,但是回头想想为什么只有这一套库的STANDBY 出现问题,
回想一下这台机器因为磁盘I/O不行,我开启了异步I/O,让I/O进入内核态缓存,加快ORACLE I/O的速度
也就是
filesystemio_options=SETALL
初衷是好的,但是经过查询MOS和一些网友的文章 也发现同样的问题。
MOS:
ORA-1578 ORA-353 ORA-19599 Corrupt blocks with zeros when filesystemio_options=SETALL on ext4 file system using Linux (文档 ID 1487957.1)
这个文章指出,不光是备库,主库的各种文件都可能出现问题,卧槽这个问题就大了。只有改参数重启数据库了。
同时我注意到这样一句话:
Database files on ext4 File System on Linux and Database parameter filesystemio_options is set to SETALL.
This is a Linux defect when using O_SYNC|O_DIRECT on ext4 file systems
(filesystemio_options=SETALL open the database files using O_SYNC|O_DIRECT).
也就是说ORACLE调用LINUX API OPEN 函数的时候 由LINUX决定使用使用O_SYNC还是O_DIRECT
那么这也就说明了一点为什么这个参数是静态的,
因为ORACLE进程调用 OPEN函数调用打开文件后返回一个文件描述符给 ORACLE进程,除非数据库
进程终止,一直在read write的情况下 这个不可能动态修改。
再来看
O_SYNC,O_DIRECT
O_DIRECT (Since Linux 2.4.10)
Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special
situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers. The I/O is syn-
chronous, that is, at the completion of a read(2) or write(2), data is guaranteed to have been transferred. See NOTES below for further
discussion.
大概意思是说,最小化系统cache,这里应该是值的内核缓冲区,一般情况会导致性能问题,但是如果应用程序有自己的缓存,可以这样干。
O_SYNC The file is opened for synchronous I/O. Any write(2)s on the resulting file descriptor will block the calling process until the data
has been physically written to the underlying hardware. But see NOTES below.
再来看看ORACLE对这个参数的描述:
FILESYTEMIO_OPTIONS can be set to one of the following values:
ASYNCH: enable asynchronous I/O on file system files, which has no timing requirement for transmission.
在文件系统文件上启用异步I/O,在数据传送上没有计时要求。
DIRECTIO: enable direct I/O on file system files, which bypasses the buffer cache.
在文件系统文件上启用直接I/O,绕过buffer cache。
SETALL: enable both asynchronous and direct I/O on file system files.
在文件系统文件上启用异步和直接I/O。
NONE: disable both asynchronous and direct I/O on file system files.
在文件系统文件上禁用异步和直接I/O。
然后看看LINUX为什么把它设置为NONE
Linux none
Oracle only supports native Linux asynchronous I/O,
which requires also using direct I/O.
Older Linux kernels do not support native asynchronous I/O.
LINUX老版本不支持(没有)原生的异步I/O(native asynchronous I/O),其实
回想MYSQL的早期版本的实现是MYSQL层自己模拟的异步的I/O,后来
mysql也采用LINUX的原生异步I/O。
ORACLE出于这种考虑设置为了NONE,其实还应该加一句就是回触发本文的
这个问题。
最后我们看看MYSQL innodb 中一般也有一个差不多的参数:
innodb_flush_method 这个参数一般设置为O_DIRECT,绕开内核缓冲区
http://blog.itpub.net/7728585/viewspace-1980262/
总之在linux 5 6 EXT4文件系统上主备不要设置
filesystemio_options=SETALL
另外设置这种底层的参数,需要参考MOS的建议或查看一下有没有什么bug或者
严重问题,千万不要想当然,除非大家都这样用。。