ORA-01591故障处理

早晨到办公室听同事说表被锁了,一试,发现表中某字段为1111111的行都被锁了,SELECT都不行。报错误ORA-01591,打开TOAD的Knowledge eXpert,描述很少,只是说由于分布式事务错误而造成锁定。询问同事,昨天通过一个存储过程调用另一个存储过程出了错误,而后者通过透明网关insert一些数据到SQl Server数据库。
立即想到打开OEM,谁知道大失所望,进入锁,根本没发现相关的对象被锁定,开始有点郁闷。转而检查会话,该用户有5个会话,都是INACTIVE,不管三七二十一,全部杀掉。结果依旧,并且锁也没有出现。远程登陆上主机,发现CPU和进程都正常,也没有发现透明网关进程挂死(之前曾发现TG4SQL在无业务量时也会出现25%左右的CPU,挂死)。
突然想到看看alert.log,经过仔细搜索,终于发现:

Wed Nov 17 00:00:04 2004
Errors in file d:\oracle\admin\xdcj\udump\xdcj_j006_3020.trc:
ORA-12012: 自动执行作业 82 出错
ORA-01591: 锁定已被有问题的分配事务处理6.5.887985挂起
ORA-06512: 在line 6

这正是出错的地方,往前追溯:

Tue Nov 16 17:35:04 2004
Error 28500 trapped in 2PC on transaction 6.5.887985. Cleaning up.
Error stack returned to user:
ORA-02054: 事务处理6.5.887985有问题
ORA-28500: 连接 ORACLE 到非 Oracle 系统时返回此信息:
[Transparent gateway for MSSQL]
ORA-02063: 紧接着2 lines(源于ZSMOS_CRM)
Tue Nov 16 17:35:04 2004
DISTRIB TRAN QDCJ.US.ORACLE.COM.5ae32328.6.5.887985
  is local tran 6.5.887985 (hex=06.05.d8cb1)
  insert pending prepared tran, scn=6606197672830 (hex=602.2010cb7e)
Tue Nov 16 17:35:07 2004
Errors in file d:\oracle\admin\xdcj\bdump\xdcj_reco_3024.trc:
ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[Transparent gateway for MSSQL][Microsoft][ODBC SQL Server Driver][SQL Server]用户 'RECOVER' 登录失败。 (SQL State: 28000; SQL Code: 18456)
ORA-02063: preceding 2 lines from ZSMOS_CRM

Tue Nov 16 17:35:12 2004
Errors in file d:\oracle\admin\xdcj\bdump\xdcj_reco_3024.trc:
ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[Transparent gateway for MSSQL][Microsoft][ODBC SQL Server Driver][SQL Server]用户 'RECOVER' 登录失败。 (SQL State: 28000; SQL Code: 18456)
ORA-02063: preceding 2 lines from ZSMOS_CRM

这就是事发地点了。看来是昨天下午远程事务失败,但是又没有返回造成分布式事务挂死,从而锁定了行。终于找到了详细的错误ORA-02054,进入TOAD一查,说是要等待或者提交该事务,可是怎么操作呢。还是打开官方文档搜索相关内容,在Adminstrator Guide中发现如下内容:
Discovering Problems with a Two-Phase Commit
The user application that commits a distributed transaction is informed of a problem by one of the following error messages:

ORA-02050: transaction ID rolled back,
           some remote dbs may be in-doubt
ORA-02051: transaction ID committed,
           some remote dbs may be in-doubt
ORA-02054: transaction ID in-doubt

A robust application should save information about a transaction if it receives any of the above errors. This information can be used later if manual distributed transaction recovery is desired.

No action is required by the administrator of any node that has one or more in-doubt distributed transactions due to a network or system failure. The automatic recovery features of Oracle transparently complete any in-doubt transaction so that the same outcome occurs on all nodes of a session tree (that is, all commit or all roll back) after the network or system failure is resolved.

In extended outages, however, you can force the commit or rollback of a transaction to release any locked data. Applications must account for such possibilities.

Determining Whether to Perform a Manual Override
Override a specific in-doubt transaction manually only when one of the following situations exists:

The in-doubt transaction locks data that is required by other transactions. This situation occurs when the ORA-01591 error message interferes with user transactions.
An in-doubt transaction prevents the extents of a rollback segment from being used by other transactions. The first portion of an in-doubt distributed transaction's local transaction ID corresponds to the ID of the rollback segment, as listed by the data dictionary views DBA_2PC_PENDING and DBA_ROLLBACK_SEGS.
The failure preventing the two-phase commit phases to complete cannot be corrected in an acceptable time period. Examples of such cases include a telecommunication network that has been damaged or a damaged database that requires a long recovery time.
Normally, you should make a decision to locally force an in-doubt distributed transaction in consultation with administrators at other locations. A wrong decision can lead to database inconsistencies that can be difficult to trace and that you must manually correct.

If the conditions above do not apply, always allow the automatic recovery features of Oracle to complete the transaction. If any of the above criteria are met, however, consider a local override of the in-doubt transaction.

看来是建议差不多,后面Oracle总是试图登录SQl Server就是要自动恢复,可是总不成功。察看视图DBA_2PC_PENDING确实发现了该事务的痕迹。要怎样操作呢?

Manually Committing an In-Doubt Transaction
Before attempting to commit the transaction, ensure that you have the proper privileges. Note the following requirements:

If the transaction was committed by... Then you must have this privilege...
You
 FORCE TRANSACTION
 
Another user
 FORCE ANY TRANSACTION
 

Committing Using Only the Transaction ID
The following SQL statement commits an in-doubt transaction:

COMMIT FORCE 'transaction_id';

The variable transaction_id is the identifier of the transaction as specified in either the LOCAL_TRAN_ID or GLOBAL_TRAN_ID columns of the DBA_2PC_PENDING data dictionary view.

For example, assume that you query DBA_2PC_PENDING and determine that LOCAL_TRAN_ID for a distributed transaction is 1:45.13.

You then issue the following SQL statement to force the commit of this in-doubt transaction:

COMMIT FORCE '1.45.13';

Committing Using an SCN
Optionally, you can specify the SCN for the transaction when forcing a transaction to commit. This feature allows you to commit an in-doubt transaction with the SCN assigned when it was committed at other nodes.

Consequently, you maintain the synchronized commit time of the distributed transaction even if there is a failure. Specify an SCN only when you can determine the SCN of the same transaction already committed at another node.

For example, assume you want to manually commit a transaction with the following global transaction ID:

SALES.ACME.COM.55d1c563.1.93.29

First, query the DBA_2PC_PENDING view of a remote database also involved with the transaction in question. Note the SCN used for the commit of the transaction at that node. Specify the SCN when committing the transaction at the local node. For example, if the SCN is 829381993, issue:

COMMIT FORCE 'SALES.ACME.COM.55d1c563.1.93.29', 829381993;

See Also:
Oracle9i SQL Reference for more information about using the COMMIT statement
 

Manually Rolling Back an In-Doubt Transaction
Before attempting to roll back the in-doubt distributed transaction, ensure that you have the proper privileges. Note the following requirements:

If the transaction was committed by... Then you must have this privilege...
You
 FORCE TRANSACTION
 
Another user
 FORCE ANY TRANSACTION
 

The following SQL statement rolls back an in-doubt transaction:

ROLLBACK FORCE 'transaction_id';

The variable transaction_id is the identifier of the transaction as specified in either the LOCAL_TRAN_ID or GLOBAL_TRAN_ID columns of the DBA_2PC_PENDING data dictionary view.

For example, to roll back the in-doubt transaction with the local transaction ID of 2.9.4, use the following statement:

ROLLBACK FORCE '2.9.4';

于是登陆数据库
COMMIT FORCE '6.5.887985';
然后查看DBA_2PC_PENDING发现状态已经改为'COMMIT FORCE',SELECT该表相关行,一切正常。至此,故障解决。
总体来看,直接INSERT ... TABLENAME@SQLDBLK还是很危险的,遇上不能正常返回就出问题了。Oracle的文档是推荐使用包或者存储过程来解决,此后建议同事改用此方法,目前已经测试通过。

时间: 2024-11-08 22:16:50

ORA-01591故障处理的相关文章

【故障处理】分布式事务ORA-01591错误解决

[故障处理]分布式事务ORA-01591错误解决 1  BLOG文档结构图       2  前言部分 2.1  导读和注意事项 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① 分布式事务的简单概念         ② ORA-01591错误解决   Tips: ① 本文在ITpub(http://blog.itpub.net/26736162).博客园(http://www.cnblogs.com/lhrbest)和微信公众号(x

Oracle数据库系统紧急故障处理方法_oracle

正在看的ORACLE教程是:Oracle数据库系统紧急故障处理方法.Oracle物理结构故障是指构成数据库的各个物理文件损坏而导致的各种数据库故障.这些故障可能是由于硬件故障造成的,也可能是人为误操作而引起.所以我们首先要判断问题的起因,如果是硬件故障则首先要解决硬件问题.在无硬件问题的前提下我们才能按照下面的处理方发来进一步处理.  控制文件损坏:  控制文件记录了关于oracle的重要配置信息,如数据库名.字符集名字.各个数据文件.日志文件的位置等等信息.控制文件的损坏,会导致数据库异常关闭

【故障处理】ORA-28040: No matching authentication protocol

[故障处理]ORA-28040: No matching authentication protocol 1.1  BLOG文档结构图   1.2  前言部分 1.2.1  导读和注意事项 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① 告警日志中频繁出现Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.ORA-28040: No matching authenticat

【故障处理】ORA-30012的解决过程

[故障处理]ORA-30012的解决过程   1  BLOG文档结构图   2  前言部分 2.1  导读和注意事项 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① ORA-30012的解决过程 Tips: ① 本文在itpub(http://blog.itpub.net/26736162).博客园(http://www.cnblogs.com/lhrbest)和微信公众号(xiaomaimiaolhr)上有同步更新. ② 文章中用

【故障处理】DBCA建库诡异问题处理--rac环境不能创建rac库

[故障处理]DBCA建库诡异问题处理--rac环境不能创建rac库 BLOG文档结构图 前言部分 导读和注意事项 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① dbca静默创建rac库 ② Inventory目录作用及其2种重建方法(重点) ③ rac环境dbca工具不能创建rac库的解决办法 ④ dbca静默建库常见问题处理 ⑤ 重建CRS集群环境执行root.sh脚本 Tips: ① 本文在ITpub(http://blog.

【故障处理】DG环境主库丢失归档情况下数据文件的恢复

[故障处理]DG环境主库丢失归档情况下数据文件的恢复 1  BLOG文档结构图     2  前言部分   2.1  导读和注意事项 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① BBED的编译 ② BBED修改文件头让其跳过归档从而可以ONLINE(重点) ③ OS命名格式转换为ASM的命名格式 ④ DG环境中备库丢失数据文件的情况下的处理过程(重点) ⑤ 数据文件OFFLINE后应立即做一次RECOVER操作 ⑥ BBED环境

【故障处理】DG归档丢失的恢复

[故障处理]DG归档丢失的恢复 一.1  BLOG文档结构图       一.2  前言部分   一.2.1  导读和注意事项 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① 主库丢失归档,物理DG的恢复 ②  其他常用SQL语句     Tips:         ① 若文章代码格式有错乱,推荐使用QQ.搜狗或360浏览器,也可以下载pdf格式的文档来查看,pdf文档下载地址:http://yunpan.cn/cdEQedhCs2

【故障处理】 DBCA建库报错CRS-2566

[故障处理] DBCA建库报错CRS-2566 PRCR-1071 PRCR-1006 一.1  BLOG文档结构图       一.2  前言部分   一.2.1  导读和注意事项 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① dbca静默建库 ② 将监听加入CRS中     Tips:        ① 若文章代码格式有错乱,推荐使用搜狗.QQ或360浏览器,也可以下载pdf格式的文档来查看,pdf文档下载地址:http://

【故障处理】ORA-19809错误处理

[故障处理]ORA-19809错误处理 1  BLOG文档结构图       2  前言部分 2.1  导读和注意事项 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① ORA-19809: limit exceeded for recovery files错误的处理方法 RMAN-03009: failure of backup command on ORA_DISK_1 channel at 07/26/2016 17:35:35

【故障处理】修改SPFILE无权限

[故障处理]修改SPFILE无权限 一.1  BLOG文档结构图       一.2  前言部分   一.2.1  导读和注意事项 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~: ① 修改spfile文件报错: ORA-01565: error in identifying file '+DATA/bocprm/spfilebocprm.ora' ORA-17503: ksfdopn:2 Failed to open file +DAT