Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 5980 11-Nov-13 16:00:37 1403 39.6
End Snap: 5981 11-Nov-13 16:31:29 1422 42.8
Elapsed: 30.86 (mins)
DB Time: 3,660.60 (mins)
Per Second Per Transaction
Redo size: 1,185,514.75 6,235.28
Logical reads: 683,361.22 3,594.17
Block changes: 2,972.67 15.63
Physical reads: 6,201.95 32.62
Physical writes: 621.67 3.27
User calls: 8,220.82 43.24
Parses: 2,793.81 14.69
Hard parses: 319.52 1.68
Sorts: 1,180.36 6.21
Logons: 12.61 0.07
Executes: 31,893.98 167.75
Transactions: 190.13
Top 5 Timed Events
Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
latch: row cache objects 6,500,460 56,331 9 25.6 Concurrency
CPU time 50,195 22.9
cursor: pin S wait on X 2,073,123 40,372 19 18.4 Concurrency
db file sequential read 5,758,293 17,702 3 8.1 User I/O
enq: TX - row lock contention 6,570 12,384 1,885 5.6 Application
可以看到DB_TIME比较高系统负载比较高,而等待集中在前3位,
我们先说
latch: row cache objects
根据v$latch_misses或者AWRRPT中可以看到其主要集中的位置为
row cache objects kqrpre: find obj 0 2,922,441 2,564,411
row cache objects kqreqd: reget 0 1,873,773 1,905,652
row cache objects kqreqd 0 1,682,121 2,008,663
同时可以看到其操作的对象是dc_object_ids
到底怎么查看其可以根据文档:
WAITEVENT: "latch: row cache objects" Reference Note (Doc ID 1550722.1)
我们同事纠结于DC_OBJECT_IDS到底什么时候访问,但是我觉得不管怎么说一定是过多的访问了数据字典我们可以从其位置
row cache objects kqrpre: find obj,row cache objects kqreqd: reget,row cache objects kqreqd进行入手
metalink有一个相关的说明
文档源:Contention on 'row cache objects' Latch After Upgrade from 10.2 to 11.2.0.3 (Doc ID 1485410.1)
The 'row cache objects' latch is called most frequently from 'kqrpre: find obj'. This module tries to find details of an object being parsed in the row cache. During parse, the row cache is searched. The process searches through a linked list protected by the 'row cache objects' latch to find the object in the cache that it needs. When other processes are also parsing and looking through the row cache this may cause contention; especially if the parse activity is excessive or unnecessary.
Top 5 Timed Foreground Events Event Waits Time(s) Avg wait (ms) DB time Wait Class latch: row cache objects 58,291,873 130,700 2 63.18 Concurrency DB CPU 60,563 29.27 db file sequential read 2,994,299 14,461 5 6.99 User I/O library cache lock 3,179 1,662 523 0.80 Concurrency gc cr grant 2-way 1,233,503 981 1 0.47 Cluster
Latch Name Where NoWait Misses Sleeps Waiter Sleeps ... row cache objects kqrpre: find obj 0 10,332,326 8,906,124 row cache objects kqreqd: reget 0 7,888,165 8,503,625 row cache objects kqreqd 0 7,677,074 8,489,832 row cache objects kqrso 0 4,127 3,622 row cache objects kqrpre: init complete 0 143 88
当然我们没有升级但是可以借鉴,其解决方案为:
1) Rewrite the application to use bind variables to lower the hard parse rate from 350/second
2) Enable the cursor_sharing = force
3) Do not use first_rows_x optimization and use all_rows instead
4) Set optimizer_feature_enabled = 10.2.0.4
5) Set event 10089 level 1 to disable index sorting.
视乎前2项都明确的说明来自硬解析的问题,而且给出了一个定量的值350,而我们这里是314。顺便说下此文档其列子如下:
Latch Name Where NoWait Misses Sleeps Waiter Sleeps
row cache objects kqrpre: find obj 0 10,332,326 8,906,124
row cache objects kqreqd: reget 0 7,888,165 8,503,625
row cache objects kqreqd 0 7,677,074 8,489,832
row cache objects kqrso 0 4,127 3,622
row cache objects kqrpre: init complete 0 143 88
他这里更加明显。但是我们这里毕竟没有超过350每秒的硬解析,所以好一些。
然后我们再说
cursor: pin S wait on X
这里参考的一个网上的文档,我相信这个文档也来自于METALINK如下:
cursor: pin S整体描述
cursor: pin S A session waits on this event when it wants to update a shared mutex pin and another session
is currently in the process of updating a shared mutex pin for the same cursor object. This wait event should
rarely be seen because a shared mutex pin update is very fast.(Wait Time: Microseconds)
--Parameter说明
P1 Hash value of cursor
P2 Mutex value
64 bit platforms
8 bytes are used.
Top 4 bytes hold the session id (if the mutex is held X)
Bottom 4 bytes hold the ref count (if the mutex is held S).
32 bit platforms
4 bytes are used.
Top 2 bytes hold the session id (if the mutex is held X)
Bottom 2 bytes hold the ref count (if the mutex is held S).
P3 Mutex where (an internal code locator) OR'd with Mutex Sleeps
--查询sql
SELECT a.*, s.sql_text
FROM v$sql s,
(SELECT sid,
event,
wait_class,
p1 cursor_hash_value,
p2raw Mutex_value,
TO_NUMBER (SUBSTR (p2raw, 1, 8), 'xxxxxxxx') hold_mutex_x_sid
FROM v$session_wait
WHERE event LIKE 'cursor%') a
WHERE s.HASH_VALUE = a.cursor_hash_value
cursor: pin S wait on X描述
- In previous versions of Oracle, library cache pin is protected by “library cache pin latch”.
- But in recent versions of Oracle(I believe it’s 10.2.0.2),
library cache pin for the cursor LCO is protected by mutext.
- Mutex is allocated per LCO, so it enables fine-grained access control.
“cursor: pin S wait on X” wait event is mostly related to mutex and hard parse.
- When a process hard parses the SQL statement, it should acquire exclusive
library cache pin for the corresponding LCO.
- This means that the process acquires the mutex in exclusive mode.
- Another process which also executes the same query needs to acquire the mutex
but it’s being blocked by preceding process. The wait event is “cursor: pin S wait on X”.
--发生cursor: pin S wait on X原因
Frequent Hard Parses
If the frequency of Hard Parsing is extremely high, then contention can occur on this pin.
High Version Counts
When Version counts become excessive, a long chain of versions needs to
be examined and this can lead to contention on this event
可以看出MUTEX在某些情况下会代替LIBRARY CACHE PIN来保护LCO,如果要进行硬解析和ALTER 操作必须以X模式占用
LCO来建立或者修改,如果此时要访问这个LCO,也就是说软解析的时候要访问LCO,那么将会出现这样的等待,LCO实际
就是存储我们的游标信息,软解析共享的就是游标共享,而不需要为每个语句生成一个LCO。
所以这个问题我觉得解决的方案还是想办法降低硬解析:
1) Rewrite the application to use bind variables to lower the hard parse rate from 350/second
2) Enable the cursor_sharing = force
这里我们可以看到硬解析 350每秒 代表一个ORACLE认为不能容忍的程度,而我们这里是319,虽然没有达到350但是也达到了一个
较高的值,出现这样的情况是可能的。