CentOS 6.5 ext4 filesystemio_options=SETALL 备库archivelog 损坏问题

数据库版本 11.2.0.3.15
操作系统 CentOS 6.5
文件系统 ext4

最近发现我们线上数据库的3台备库，全部出现一样的错误如下：
INCIDENT_ID PROBLEM_KEY CREATE_TIME
-------------------- ----------------------------------------------------------- ----------------------------------------
352249 ORA 445 2016-04-07 22:18:02.350000 +08:00
640297 ORA 600 [3020] 2016-11-08 20:09:44.612000 +08:00
640217 ORA 600 [ORA-00600: internal error code, arguments: [3020], 2016-11-08 20:10:19.945000 +08:00
640298 ORA 600 [3020] 2016-11-14 11:40:52.532000 +08:00
640177 ORA 600 [ORA-00600: internal error code, arguments: [3020], 2016-11-14 11:41:32.540000 +08:00

Dump continued from file: /home/oracle/database/diag/rdbms/yjfhisd/yjfhis/trace/yjfhis_mrp0_27052.trc
ORA-00600: internal error code, arguments: [3020], [3], [33188], [12616100], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 33188, file offset is 271876096 bytes)
ORA-10564: tablespace UNDOTBS1
ORA-01110: da
========= Dump for incident 640177 (ORA 600 [ORA-00600: internal error code, arguments: [3020], [3], [33188], [12616100], [], [], [], [], [], [], [], []

开始查询这个错误发现可能是BUG引起的，只能是重新进行备库配置，但是回头想想为什么只有这一套库的STANDBY 出现问题，
回想一下这台机器因为磁盘I/O不行，我开启了异步I/O，让I/O进入内核态缓存，加快ORACLE I/O的速度
也就是
filesystemio_options=SETALL
初衷是好的，但是经过查询MOS和一些网友的文章也发现同样的问题。
MOS：
ORA-1578 ORA-353 ORA-19599 Corrupt blocks with zeros when filesystemio_options=SETALL on ext4 file system using Linux (文档 ID 1487957.1)
这个文章指出，不光是备库，主库的各种文件都可能出现问题，卧槽这个问题就大了。只有改参数重启数据库了。

同时我注意到这样一句话：
Database files on ext4 File System on Linux and Database parameter filesystemio_options is set to SETALL.
This is a Linux defect when using O_SYNC|O_DIRECT on ext4 file systems
(filesystemio_options=SETALL open the database files using O_SYNC|O_DIRECT).

也就是说ORACLE调用LINUX API OPEN 函数的时候由LINUX决定使用使用O_SYNC还是O_DIRECT
那么这也就说明了一点为什么这个参数是静态的，
因为ORACLE进程调用 OPEN函数调用打开文件后返回一个文件描述符给 ORACLE进程，除非数据库
进程终止，一直在read write的情况下这个不可能动态修改。

再来看
O_SYNC,O_DIRECT
O_DIRECT (Since Linux 2.4.10)
Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special
situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers. The I/O is syn-
chronous, that is, at the completion of a read(2) or write(2), data is guaranteed to have been transferred. See NOTES below for further
discussion.
大概意思是说，最小化系统cache，这里应该是值的内核缓冲区，一般情况会导致性能问题，但是如果应用程序有自己的缓存，可以这样干。

O_SYNC The file is opened for synchronous I/O. Any write(2)s on the resulting file descriptor will block the calling process until the data
has been physically written to the underlying hardware. But see NOTES below.

再来看看ORACLE对这个参数的描述：

FILESYTEMIO_OPTIONS can be set to one of the following values:
ASYNCH: enable asynchronous I/O on file system files, which has no timing requirement for transmission.
在文件系统文件上启用异步I/O，在数据传送上没有计时要求。
DIRECTIO: enable direct I/O on file system files, which bypasses the buffer cache.
在文件系统文件上启用直接I/O，绕过buffer cache。
SETALL: enable both asynchronous and direct I/O on file system files.
在文件系统文件上启用异步和直接I/O。
NONE: disable both asynchronous and direct I/O on file system files.
在文件系统文件上禁用异步和直接I/O。

然后看看LINUX为什么把它设置为NONE
Linux none
Oracle only supports native Linux asynchronous I/O,
which requires also using direct I/O.
Older Linux kernels do not support native asynchronous I/O.

LINUX老版本不支持(没有)原生的异步I/O(native asynchronous I/O),其实
回想MYSQL的早期版本的实现是MYSQL层自己模拟的异步的I/O，后来
mysql也采用LINUX的原生异步I/O。
ORACLE出于这种考虑设置为了NONE，其实还应该加一句就是回触发本文的
这个问题。

最后我们看看MYSQL innodb 中一般也有一个差不多的参数：
innodb_flush_method 这个参数一般设置为O_DIRECT，绕开内核缓冲区
http://blog.itpub.net/7728585/viewspace-1980262/

总之在linux 5 6 EXT4文件系统上主备不要设置
filesystemio_options=SETALL
另外设置这种底层的参数，需要参考MOS的建议或查看一下有没有什么bug或者
严重问题,千万不要想当然，除非大家都这样用。。

时间： 2024-12-25 08:32:32

CentOS 6.5 ext4 filesystemio_options=SETALL 备库archivelog 损坏问题

CentOS 6.5 ext4 filesystemio_options=SETALL 备库archivelog 损坏问题的相关文章

oracle中11gR2 dataguard 备库文件损坏解决案例

如何搭建阿里云RDS PostgreSQL数据库的物理备库

你的备库做好准备了吗

逻辑备库的Swichover和Failover

MySQL复制（2）主备库都为空的情况下创建主备复制

Oracle中DG备库报错ORA-00313、00312、27037

oracle 10g物理备库也可以read/write

Oracle 11g DataGuard 物理备库配置及Active DataGuard测试

Oracle 11g Dataguard物理备库配置(六) broker fastfailover测试