这篇文章将通过两篇MOS文章来讨论AIX平台下为磁盘分配PVID对ASM磁盘的破坏。
文章一:
这篇文章说明的是对一个存在的ASM磁盘分配PVID将破坏ASM的磁盘头,导致ASM磁盘组无法正常MOUNT。
Assigning a Physical Volume ID (PVID) To An Existing ASM Disk Corrupts the ASM Disk Header (文档 ID 353761.1)
修改时间:2013-4-19类型:ALERT
In this Document
Description |
Occurrence |
Symptoms |
Workaround |
History |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 10.1.0.2 to 11.2.0.3 [Release 10.1 to 11.2]
IBM AIX on POWER Systems (64-bit)
***Checked for relevance on 30-Apr-2010***
AIX5L Based Systems (64-bit)
DESCRIPTION
Assigning a Physical Volume ID (PVID) to an existing ASM disk will destroy the ASM disk header rendering the
ASM disk unusable.
Various documents including the 10gR1 and 10gR2 installation instructions for AIX platforms suggest to assign
a PVID to disks to be used for ASM using the following command:
# /usr/sbin/chdev -l hdiskn -a pv=yes
These documents furthermore suggest that this command is to be run on ALL nodes of a RAC cluster. This
does not present a problem as long as the disks have not yet been used by ASM. If however the disk are
already in use and above command is issued against an ASM disk the file header will be destroyed.
This is likely to happen if a new node is added to an existing RAC cluster as the documentation seems
to imply this has to be done on all nodes.
To check if a device has an associated PVID , use lspv:
EXAMPLE:
# lspv
hdisk0 0003286f04bc73ee rootvg active
hdisk1 0003286f867d77e1 rootvg active
hdisk2 0003286fb3470dae vg01 active
hdisk3 0003286fb3474190 vg01 active
hdisk4 0003286fb34747d1 vg01 active
hdisk5 0003286fb3474dff vg01 active
hdisk6 0003286fb3475428 vg01 active
hdisk7 0003286fb347607d vg01 active
hdisk8 0003286fb34766f3 vg01 active
hdisk9 0003286fb3476d70 vg01 active
hdisk10 0003286fb34773d5 vg01 active
hdisk11 0003286fb34780b8 vg01 active
hdisk12 0003286fb347872f vg01 active
hdisk13 0003286fb347940c vg01 active
hdisk14 0003286fb3479a7b vg01 active
The second column is the PVID.
OCCURRENCE
This is more likely to happen in a RAC environment, specifically if a new node is added to an existing
cluster.
SYMPTOMS
If the 'chdev' command is run while ASM instances have the disk mounted nothing will be noticed immediately
as the disk header is only read when the disk is mounted. If however the diskgroup is unmounted and re-mounted
(e.g. ASM instance restart) the disk is no longer recognized as an ASM disk and the diskgroup mount will fail
with
ORA-15063 "diskgroup \"%s\" lacks quorum of %s PST disks; %s found"
or
ORA-15063: ASM discovered an insufficient number of disks for diskgroup s%
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "%" is missing
WORKAROUND
Do not assign a PVID to ASM disks, contrary to the documentation PVIDs are not required for ASM disks
as ASM uses the ASM disk header to discover it's disks.
This has been addressed in (Documentation) Bug 3636335 which states:
"This is a doc. bug and we are going to clearly document not to put PVIDs on disks given to ASM. The idea here is that ASM is the one which manages the disk and not any OS / vendor volume managers etc., PVIDs are needed for volume groups to work. For ASM to work, PVIDs are not needed. ASM has its own headers to identify the disk which is what is getting written here. "
As long as there is still an ASM instance which has the disk(group) mounted the
contents may be backed up via RMAN as soon as possible.
Also the action plan from the Document 750016.1 can be applied. Also recommend to raise an SR with Oracle Support.
HISTORY
Checked for relevance on 18-APR-2013
REFERENCES
BUG:3636335 - PVID IN DISK HEADER IS OVERWRITTEN AFTER ADDING A NEW DISK TO ASM DISKGROUP
NOTE:750016.1 - Corrective Action for ASM Diskgroup with Disks Having PVIDs on AIX
文章二:
这篇文章解释了两方面的问题,其一,如果在创建ASM磁盘组之前所属的ASM磁盘就有了PVID,磁盘组创建成功将磁盘头的PVID信息覆盖掉,但由于磁盘的PVID信息会存在磁盘头和ODM库中,服务器一旦重启,AIX会尝试用ODM库中的PVID重新覆盖磁盘头,从而破坏ASM磁盘头。其二,如果出现了上述情况,在没有重启操作系统之前如何清除磁盘的PVID。
Corrective Action for ASM Diskgroup with Disks Having PVIDs on AIX (文档 ID 750016.1)
修改时间:2013-4-7类型:HOWTO
In this Document
Goal |
Solution |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 10.1.0.2 to 11.2.0.2 [Release 10.1 to 11.2]
IBM AIX on POWER Systems (64-bit)
IBM AIX Based Systems (64-bit)
GOAL
You have created a diskgroup with disks having PVID and the diskgroup is in use. There is no diskgroup metadata corruption reported yet. You now know that ASM Disk should not have PVID as alerted in MetaLink Note 353761.1
This note will give the steps to clear the PVID of these ASM Disks.
SOLUTION
When the PVID is set to a disk in a volume group, the PVID is stored in two locations. In Physical disk header ( within first 4K )and in AIX's system object database, called ODM ( Object Data Manager ).
When the diskgroup is created, the disk header information of PVID is overwritten. However, with reboot the OS, from ODM, AIX might try to restore the PVID information onto the disk header,
there by destroying the ASM metadata.
If the ASM disk header Metadata has not been over written by PVID from ODM ( before a reboot ), then you can follow the following steps to update the ODM not to have PVID for the disks:
1] Do not reboot any node.
1.1] Drop one disk at a time from the diskgroup.
1.2] Clear the PVID of the dropped disk
# chdev -l hdisk5 -a pv=clear
Run this on ALL the nodes in case of RAC.
1.3] Check the disk does not have the PVID from ALL the nodes
# lspv
1.4] Add the disk back to the diskgroup
1.5] Do this for all the disks having PVID in the diskgroup, one by one. Take care that the rebalance is complete from the drop/add disk command before going for the next disk.
OR
2] This needs downtime:
2.1] Take 'dd' backup of the disk headers
# dd if=/dev/hdisk5 of=/tmp/d5.txt bs=1024 count=1024
2.2] Shutdown ASM instance ( on ALL the nodes in RAC setup ).
2.3] Clear the PVID
# chdev -l hdisk5 -a pv=clear
Run this on ALL the nodes in case of RAC.
2.4] Check the disk does not have the PVID from ALL the nodes
# lspv
2.5] Start the ASM Instance(s) and mount the diskgroup on ALL the nodes
WARNING:
Point-2 commands overrides the content of the disk header and so could be destructive if not correctly used. If you have any doubt, raise an SR with Oracle Support before any action.
总结:
不管是手动还是AIX自动为磁盘分配PVID都将破坏ASM磁盘头,导致ASM磁盘组无法加载。为了避免出现这种情况,我们应该遵守以下的规则:
1).确保在创建ASM磁盘组之前,清除所有节点所有ASM需要使用的磁盘的PVID。
2).磁盘组一旦创建成功,应该避免对ASM磁盘手动分配PVID。
3).磁盘组一旦创建成功,应该手动执行ASMCMD工具下的md_backup命令对磁盘组元数据进行备份。
4).在规划的时候,建议每次磁盘组由两个以上的ASM磁盘组成。
--end--