[20160624]使用10035事件跟踪无法执行的sql语句.txt
--昨天看一份awr报表,链接如下:
http://www.itpub.net/thread-2061952-1-1.html
--摘要如下:
Top 10 Foreground Events by Total Wait Time
Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
library cache lock 840,813 19.9K 24 68.6 Concurrency
library cache: mutex X 1,098,698 5199.6 5 17.9 Concurrency
DB CPU 3138.3 10.8
cursor: pin S wait on X 32,646 302.1 9 1.0 Concurrency
direct path write temp 127,841 270.1 2 .9 User I/O
SQL*Net break/reset to client 827,306 208.6 0 .7 Application
...
Time Model Statistics
Total time in database user-calls (DB Time): 29052.3s
Statistics including the word "background" measure background process time, and so do not contribute to the DB time statistic
Ordered by % or DB time desc, Statistic name
Statistic Name Time (s) % of DB Time
parse time elapsed 25,768.03 88.70
failed parse elapsed time 25,750.45 88.63
DB CPU 3,138.32 10.80
sql execute elapsed time 2,729.15 9.39
hard parse elapsed time 167.67 0.58
PL/SQL execution elapsed time 16.13 0.06
--如果结合两个看可以发现failed parse elapsed time时间异常高,说明有sql语句无法执行或者执行报错.而且这个有问题的语句sql语句执行很频繁,导致还出现了
--library cache lock ,library cache: mutex X.
--如何定位有问题的sql语句是解决这个问题的关键,实际上查询v$active_session_history的event='SQL*Net break/reset to client'能够知道sql_id,但是查询
--v$sql以及v$sqlarea是无法找到有问题的sql语句的.我以前提高使用x$kglob的kglobt03 = '&sql_id' 可以定位问题,作者也提出通过10035事件可以定位sql语句.
--自己也做一些测试:
1.环境:
SCOTT@book> @ &r/ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
2.测试1:
SCOTT@book> select sysdate from xxxxxxxxxxxxxx;
select sysdate from xxxxxxxxxxxxxx
*
ERROR at line 1:
ORA-00942: table or view does not exist
--我的系统不存在这个表,sql语句分析没有错误.
--如果调用很频繁,event='SQL*Net break/reset to client'会有记录.我以前遇到一例就是共享池设太小,在1个小时的周期里面已经从共享池清除.链接:
http://blog.itpub.net/267265/viewspace-1749265/
http://blog.itpub.net/267265/viewspace-1760052/
http://blog.itpub.net/267265/viewspace-1760158/
--这3个链接记录3种可能性:1.共享池设置太小 2. lob字段 3.语句执行有问题.语义正确,但是分析错误无法执行.
--也就是知道sql_id是无法找到sql语句,但是查询基表x$kglob都可以查询到对应的sql语句.
--获取sql_id,我这里执行次数太小,在视图v$active_session_history的event='SQL*Net break/reset to client'无法找到.
SYS@book> select sys.dbms_sqltune_util0.sqltext_to_sqlid('select sysdate from xxxxxxxxxxxxxx'||chr(0)) c20 from dual;
C20
--------------------
bb98w6dj7wz8p
SYS@book> @ &r/sharepool/shp4 bb98w6dj7wz8p 0
old 17: WHERE kglobt03 = '&1' or kglhdpar='&1' or kglhdadr='&1' or KGLNAHSH= &2
new 17: WHERE kglobt03 = 'bb98w6dj7wz8p' or kglhdpar='bb98w6dj7wz8p' or kglhdadr='bb98w6dj7wz8p' or KGLNAHSH= 0
TEXT KGLHDADR KGLHDPAR C40 KGLOBHD0 KGLOBHD6 KGLOBHS0 KGLOBHS6 KGLOBT16 N0_6_16 N20 KGLNAHSH KGLOBT03 KGLOBT09
-------------- ---------------- ---------------- ---------------------------------------- ---------------- ---------------- ---------- ---------- ---------- --------- ---------- ---------- ------------- ----------
子游标句柄地址 000000007CA54F80 000000007C96DE08 select sysdate from xxxxxxxxxxxxxx 00 00 0 0 3067 3067 3067 1652456725 bb98w6dj7wz8p 0
父游标句柄地址 000000007C96DE08 000000007C96DE08 select sysdate from xxxxxxxxxxxxxx 000000007B9B1A88 00 4752 0 0 4752 4752 1652456725 bb98w6dj7wz8p 65535
--作者提到kglnaobj字段仅仅看到一部分,实际上1000个字节,实际上查询KGLFNOBJ 字段是可以看到全部的.至少我的测试可以.
SYS@book> select kglnaobj c60 from x$kglob where kglobt03='bb98w6dj7wz8p' and rownum=1;
C60
------------------------------------------------------------
select sysdate from xxxxxxxxxxxxxx
3.测试2:使用10035事件:
$ oerr ora 10035
10035, 00000, "Write parse failures to alert log file"
// *Cause:
SCOTT@book> alter session set events '10035 trace name context forever, level 12' ;
Session altered.
SCOTT@book> select sysdate from xxxxxxxxxxxxxx;
select sysdate from xxxxxxxxxxxxxx
*
ERROR at line 1:
ORA-00942: table or view does not exist
--检查alert文件发现:
Fri Jun 24 10:14:19 2016
PARSE ERROR: ospid=51356, error=942 for statement:
select sysdate from xxxxxxxxxxxxxx^@
Additional information: hd=0x7ca54f80 phd=0x7c96de08 flg=0x20 cisid=83 sid=83 ciuid=83 uid=83
--如果你仔细看里面的信息,可以发现hd=0x7ca54f80 对应的就是子游标句柄地址 000000007CA54F80
--而phd=0x7c96de08 对应的就是 父游标句柄地址 000000007C96DE08.
--注意一个细节: ^@ 表示 chr(0).
--后面的cisid=83 sid=83 ciuid=83 uid=83,对应执行用户的id.
SYS@book> select user# from sys.user$ where name='SCOTT';
USER#
----------
83
4.测试3:使用oradebug看看.
--一般应用的执行是另外的进程.使用oradebug看看.
--session 1:
SCOTT@book> @ &r/spid
SID SERIAL# SPID PID P_SERIAL# C50
---------- ---------- ------ ------- ---------- --------------------------------------------------
46 2503 51552 28 118 alter system kill session '46,2503' immediate;
--登录sys用户执行如下:
--session 2:
SYS@book> oradebug setospid 51552
Oracle pid: 28, Unix process pid: 51552, image: oracle@gxqyydg4 (TNS V1-V3)
SYS@book> oradebug event 10035 trace name context forever,level 12
Statement processed.
----session 1,注意修改分号前的字符y,与前面执行的不同:
SCOTT@book> select sysdate from xxxxxxxxxxxxxy;
select sysdate from xxxxxxxxxxxxxy
*
ERROR at line 1:
ORA-00942: table or view does not exist
--//再次检查alert.log文件,可以发现:
Fri Jun 24 10:22:00 2016
PARSE ERROR: ospid=51552, error=942 for statement:
select sysdate from xxxxxxxxxxxxxy^@
Additional information: hd=0x7cbefb30 phd=0x7b65aa28 flg=0x20 cisid=83 sid=83 ciuid=83 uid=83
SYS@book> select sys.dbms_sqltune_util0.sqltext_to_sqlid('select sysdate from xxxxxxxxxxxxxy'||chr(0)) c20 from dual;
C20
--------------------
3vqsf0t328a8x
SYS@book> @ &r/sharepool/shp4 3vqsf0t328a8x 0
old 17: WHERE kglobt03 = '&1' or kglhdpar='&1' or kglhdadr='&1' or KGLNAHSH= &2
new 17: WHERE kglobt03 = '3vqsf0t328a8x' or kglhdpar='3vqsf0t328a8x' or kglhdadr='3vqsf0t328a8x' or KGLNAHSH= 0
TEXT KGLHDADR KGLHDPAR C40 KGLOBHD0 KGLOBHD6 KGLOBHS0 KGLOBHS6 KGLOBT16 N0_6_16 N20 KGLNAHSH KGLOBT03 KGLOBT09
-------------- ---------------- ---------------- ---------------------------------------- ---------------- ---------------- ---------- ---------- ---------- --------- ---------- ---------- ------------- ----------
子游标句柄地址 000000007CBEFB30 000000007B65AA28 select sysdate from xxxxxxxxxxxxxy 00 00 0 0 3067 3067 3067 1176774941 3vqsf0t328a8x 0
父游标句柄地址 000000007B65AA28 000000007B65AA28 select sysdate from xxxxxxxxxxxxxy 000000007C53F148 00 4720 0 0 4720 4720 1176774941 3vqsf0t328a8x 65535
--跟踪获取的信息一样.这样就能获得一个定位问题的方法,通过10035事件查询失败的sql语句.
--另外如果程序大量使用非绑定变量,这些无法执行sql语句一样耗费大量的共享池内存.
--附上shp4.sql的执行脚本:
$ cat shp4.sql
column N0_6_16 format 99999999
SELECT DECODE (kglhdadr,
kglhdpar, '父游标句柄地址',
'子游标句柄地址')
text,
kglhdadr,
kglhdpar,
substr(kglnaobj,1,40) c40,
kglobhd0,
kglobhd6,
kglobhs0,kglobhs6,kglobt16,
kglobhs0+kglobhs6+kglobt16 N0_6_16,
kglobhs0+kglobhs1+kglobhs2+kglobhs3+kglobhs4+kglobhs5+kglobhs6+kglobt16 N20,
kglnahsh,
kglobt03 ,
kglobt09
FROM x$kglob
WHERE kglobt03 = '&1' or kglhdpar='&1' or kglhdadr='&1' or KGLNAHSH= &2;
5.另外还有一个方法,通过触发器记录sql语句的执行错误:
--建立一张表SYS.ERROR_LOG.
CREATE TABLE SYS.ERROR_LOG
(
IP_ADDRESS VARCHAR2(30 BYTE),
USERNAME VARCHAR2(30 BYTE),
INSTANCE_NUMBER NUMBER(2),
THEDATE DATE,
ERRORMSG VARCHAR2(4000 BYTE),
OTHERMSG VARCHAR2(4000 BYTE),
SQLSTMT VARCHAR2(4000 BYTE)
)
TABLESPACE TOOLS;
CREATE OR REPLACE TRIGGER SYS.log_errors
AFTER SERVERERROR ON DATABASE
DECLARE
sqltext ora_name_list_t;
errmsg VARCHAR2 (4000) := NULL;
errstmt VARCHAR2 (4000) := NULL;
errothermsg VARCHAR2 (4000) := NULL;
BEGIN
IF ( -- NOT is_servererror (904)
-- AND NOT is_servererror (942)
-- AND NOT is_servererror (937)
-- AND NOT is_servererror (1002)
-- AND NOT is_servererror (1003)
-- AND
NOT is_servererror (942) AND NOT is_servererror (1407)
-- AND NOT is_servererror (1430)
-- AND NOT is_servererror (54)
-- AND NOT is_servererror (6550)
-- AND NOT is_servererror (936)
--AND NOT (is_servererror (1) AND ora_dict_obj_owner = 'EQUIPMENT')
)
THEN
FOR DEPTH IN 1 .. ora_server_error_depth
LOOP
errmsg := errmsg || ora_server_error_msg (DEPTH);
END LOOP;
FOR cntr IN 1 .. ora_sql_txt (sqltext)
LOOP
errstmt := errstmt || sqltext (cntr);
END LOOP;
SELECT 'sid:' || SID || ' SERIAL#: ' || serial# || ' terminal:' || terminal || ' program:' || program || ' module:'
|| module
INTO errothermsg
FROM v$session
WHERE SID = (SELECT SID
FROM v$mystat
WHERE ROWNUM = 1);
INSERT INTO error_log
(ip_address, username, instance_number, thedate, errormsg, othermsg,
sqlstmt
)
VALUES (SYS_CONTEXT ('USERENV', 'IP_ADDRESS'), ora_login_user, ora_instance_num, SYSDATE, errmsg, errothermsg,
errstmt
);
END IF;
END;
/
--ALTER TRIGGER SYS.LOG_ERRORS DISABLE;
--注我的脚本错误条件排除了ora-942,ora-1407,你可以根据自己需要修改脚本.