[20170824]SQL/Net message from client与网络丢包模拟.txt
--//以前做的测试:http://blog.itpub.net/267265/viewspace-2130032/
--//我本来的意思是测试网络很慢的情况下,会发生什么情况?
--//我的测试SQL/Net message from client等待事件很长,但是SQL*Net message to client 的ela确很小.
--//也就是给人的感觉当网络很慢的情况下,要注意SQL/Net message from client等待事件.
--//我发现我当时做了流量限制,当时并不会丢包,仅仅慢一些.想起以前模拟参考链接,模拟出现丢包的情况,会发生什么
--//www.linuxfoundation.org/collaborate/workgroups/networking/netem
--//先看看官方手册:docs.oracle.com/cd/E11882_01/server.112/e40402/waitevents003.htm#REFRN00634
SQL*Net message from client
The server process (foreground process) waits for a message from the client process to arrive.
Wait Time: The time it took for a message to arrive from the client since the last message was sent to the client
Parameter Description
driver id See "driver id"
#bytes The number of bytes received by the server (foreground process) from the client.
SQL*Net message to client
The server (foreground process) is sending a message to the client.
Wait Time: The actual time the send takes
Parameter Description
driver id See "driver id"
#bytes The number of bytes sent by the server process to the client
1.环境:
SYS@book> @ &r/ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
--//在服务端执行:
# tc qdisc show dev eth0
qdisc mq 0: root
# tc qdisc add dev eth0 root netem loss 40% 25%
--//# tc qdisc delete dev eth0 root netem loss 40% 25%
--//# 这样丢包很厉害,远程ssh连接都很慢,事先要写好取消命令,到时候copy & paste就ok了.(打命令很困难)
# tc qdisc show dev eth0
qdisc netem 8001: root refcnt 6 limit 1000 loss 40% 25%
--// This will cause 40 of packets to be lost, and each successive probability depends by a quarter on the last one.
--//Probn = .25 * Probn-1 + .75 * Random
--//从client ping服务器,发现丢包.
# ping -s 8192 192.168.100.78 -c 20
PING 192.168.100.78 (192.168.100.78) 8192(8220) bytes of data.
8200 bytes from 192.168.100.78: icmp_seq=3 ttl=64 time=0.393 ms
8200 bytes from 192.168.100.78: icmp_seq=4 ttl=64 time=0.400 ms
8200 bytes from 192.168.100.78: icmp_seq=6 ttl=64 time=0.351 ms
8200 bytes from 192.168.100.78: icmp_seq=7 ttl=64 time=0.396 ms
8200 bytes from 192.168.100.78: icmp_seq=9 ttl=64 time=0.426 ms
8200 bytes from 192.168.100.78: icmp_seq=10 ttl=64 time=0.332 ms
8200 bytes from 192.168.100.78: icmp_seq=12 ttl=64 time=0.334 ms
8200 bytes from 192.168.100.78: icmp_seq=13 ttl=64 time=0.395 ms
8200 bytes from 192.168.100.78: icmp_seq=15 ttl=64 time=0.399 ms
8200 bytes from 192.168.100.78: icmp_seq=17 ttl=64 time=0.404 ms
8200 bytes from 192.168.100.78: icmp_seq=18 ttl=64 time=0.344 ms
8200 bytes from 192.168.100.78: icmp_seq=19 ttl=64 time=0.385 ms
--- 192.168.100.78 ping statistics ---
20 packets transmitted, 12 received, 40% packet loss, time 19014ms
rtt min/avg/max/mdev = 0.332/0.379/0.426/0.039 ms, pipe 2
2.测试:
$ cat aa.sql
set array 7
alter session set events '10046 trace name context forever, level 12';
host sleep 10
select * from emp;
host sleep 12
alter session set events '10046 trace name context off';
--//在client端执行以上脚本:
--//注先执行1次select * from emp;.避免递归语句.
--//跟踪文件内容如下:
*** 2017-08-24 11:06:48.940
*** SESSION ID:(16.17) 2017-08-24 11:06:48.940
*** CLIENT ID:() 2017-08-24 11:06:48.940
*** SERVICE NAME:(book) 2017-08-24 11:06:48.940
*** MODULE NAME:(SQL*Plus) 2017-08-24 11:06:48.940
*** ACTION NAME:() 2017-08-24 11:06:48.940
WAIT #140568569128584: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544008940478
*** 2017-08-24 11:06:59.156
WAIT #140568569128584: nam='SQL*Net message from client1' ela= 10215533 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544019156541
CLOSE #140568569128584:c=0,e=8,dep=0,type=1,tim=1503544019156653
=====================
PARSING IN CURSOR #140568569243528 len=17 dep=0 uid=83 oct=3 lid=83 tim=1503544019156841 hv=1745700775 ad='61c8f7a0' sqlid='a2dk8bdn0ujx7'
select * from emp
END OF STMT
PARSE #140568569243528:c=0,e=133,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3956160932,tim=1503544019156835
EXEC #140568569243528:c=0,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3956160932,tim=1503544019157002
WAIT #140568569243528: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544019157076
FETCH #140568569243528:c=0,e=99,p=0,cr=6,cu=0,mis=0,r=1,dep=0,og=1,plh=3956160932,tim=1503544019157226
WAIT #140568569243528: nam='SQL*Net message from client' ela= 827 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544019158106
WAIT #140568569243528: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544019158198
FETCH #140568569243528:c=0,e=61,p=0,cr=1,cu=0,mis=0,r=7,dep=0,og=1,plh=3956160932,tim=1503544019158238
WAIT #140568569243528: nam='SQL*Net message from client' ela= 619339 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544019777623
WAIT #140568569243528: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544019777726
FETCH #140568569243528:c=0,e=86,p=0,cr=1,cu=0,mis=0,r=6,dep=0,og=1,plh=3956160932,tim=1503544019777787
STAT #140568569243528 id=1 cnt=14 pid=0 pos=1 obj=87108 op='TABLE ACCESS FULL EMP (cr=8 pr=0 pw=0 time=89 us cost=3 size=532 card=14)'
*** 2017-08-24 11:07:11.786
WAIT #140568569243528: nam='SQL*Net message from client' ela= 12008336 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544031786313
CLOSE #140568569243528:c=0,e=11,dep=0,type=0,tim=1503544031786439
=====================
PARSING IN CURSOR #140568569243528 len=55 dep=0 uid=83 oct=42 lid=83 tim=1503544031786533 hv=2217940283 ad='0' sqlid='06nvwn223659v'
alter session set events '10046 trace name context off'
END OF STMT
PARSE #140568569243528:c=1000,e=31,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1503544031786532
EXEC #140568569243528:c=0,e=419,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1503544031787033
SYS@book> @ &r/ev_name 'SQL*Net message%client%'
EVENT# EVENT_ID NAME PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS
------ ---------- --------------------------- ---------- ---------- ---------- ------------- ----------- --------------------
350 2067390145 SQL*Net message to client driver id #bytes 2000153315 7 Network
354 1421975091 SQL*Net message from client driver id #bytes 2723168908 6 Idle
--//oracle在分类上SQL*Net message to client是network类型事件.SQL*Net message from client是Idle类型事件.
--//我以前的例子说明如果网络很慢,要特别注意SQL*Net message from client ,不是没有必要不关注这个事件.
--//官方文档: Wait Time: The actual time the send takes
--//简单分析:
$ grep "SQL\*Net message to client" /u01/app/oracle/diag/rdbms/book/book/trace/book_ora_23505.trc
WAIT #140568569128584: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544008940478
WAIT #140568569243528: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544019157076
WAIT #140568569243528: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544019158198
WAIT #140568569243528: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544019777726
--//即使在这样的丢包率下,ela也是很小,也就是根本不能通过这个事件定位网络问题.而且 #bytes=1很小.
$ grep "SQL\*Net message from client" /u01/app/oracle/diag/rdbms/book/book/trace/book_ora_23505.trc
WAIT #140568569128584: nam='SQL*Net message from client' ela= 10215533 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544019156541
WAIT #140568569243528: nam='SQL*Net message from client' ela= 827 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544019158106
WAIT #140568569243528: nam='SQL*Net message from client' ela= 619339 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544019777623
WAIT #140568569243528: nam='SQL*Net message from client' ela= 12008336 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1503544031786313
--//注意第1,4次出现的ela= 10215533,ela= 12008336,正好与我脚本host sleep命令有关,也就是出现这个等待事件大部分时间是等待用
--//户输入命令的事件.
--//而中间2个就非常值得关注.
--//官方手册: Wait Time: The time it took for a message to arrive from the client since the last message was sent to the client
--//这个实际上前一个消息与后一个消息的时间差.
总结:
1.出现网络缓慢或者丢包的情况,应该关注的是SQL*Net message from client.而SQL*Net message to client应该不考虑.
2.另外千万不要在生产系统做这样的测试,我当时差点不能远程操作服务器.很慢....