接着昨天的那个问题来说。有个sql语句在做了统计信息收集之后,速度有了一定的提升,从5秒的响应降低到了2秒。但是和预期还是有一定
的差距,按照80条查询请求在短时间内响应。2秒*80000次/60/60=44.4小时,本来感觉可以接受的一下子就成了大问题。
|
Elapsed Time (s) |
Executions |
Elapsed Time per Exec (s) |
%Total |
%CPU |
%IO |
SQL Id |
SQL Module |
SQL Text |
31st May |
305.44 |
145 |
2.11 |
25.56 |
99.69 |
0.26 |
2fjzq67jbztwv |
gext1GenericEx@ccbdbpr1 (TNS V1-V3) |
/* */ SELECT DISTINCT 'K', AR.... |
和那个同事又确认了下,他说在其他项目也用的这个sql语句数据量还要大的多,就是没有问题。
有了昨天的一些数据,我自己也基本心里有数了,我表示怀疑,在此基础上能做的就是仔细看看这个sql语句了,看看到底实现细节有没有问题。
可以看到加入了一些hint,嵌入了子查询。
SELECT DISTINCT 'K',
AR.RESOURCE_VALUE,
AR.RESOURCE_TYPE,
GREATEST(TO_CHAR(AR.EFFECTIVE_DATE, 'YYYYMMDDHH24MISS'),
TO_CHAR(SB.EFFECTIVE_DATE, 'YYYYMMDDHH24MISS')),
LEAST(NVL(TO_CHAR(AR.EXPIRATION_DATE, 'YYYYMMDDHH24MISS'),
'47001231000000'),
NVL(TO_CHAR(SB.EXPIRATION_DATE, 'YYYYMMDDHH24MISS'),
'47001231000000')),
AR.AGR_NO,
SB.MEDIUM_CUS_ID,
SB.SUB_STATUS,
SB.BUSINESS_ENTITY_ID,
SB.LANGUAGE,
SB.ROUTING_POLICY_ID,
SB.L9_PORT_IND,
SB.L9_SPLIT_PERIOD
FROM HUGE_RESOURCE AR, --这个表有大约2000多万条数据,做了分区
MEDIUM_SUB SB, --这个表有50万左右的数据
(select /*+ RESULT_CACHE */
DISTINCT PARAM_NAME as PARAM_NAME
from SMALL_PARAM --这个表比较小,有不到2万条记录,如果加过滤条件,能过滤掉一半多的数据,因为那个字段不在索引字段里,所以加了result_cache
where GUIDING_IND = 'Y') OP,
MEDIUM_CUS CS
WHERE AR.AGR_NO = 1056851
AND AR.AGREEMENT_KEY = MOD(1056851, 100)
AND (AR.RESOURCE_STATE != 'F' OR AR.RESOURCE_STATE IS NULL)
AND AR.RANGE_IND = 'N'
AND SB.MEDIUM_SUB_NO = AR.AGR_NO
AND EXISTS
(select /*+ INDEX(OP SMALL_PARAM_1IX) */
1
from SMALL_PARAM OP
where OP.PARAM_NAME in (AR.RESOURCE_PRM_CD, AR.BASE_PARAM_NAME) )
AND SB.MEDIUM_CUS_ID = CS.MEDIUM_CUS_ID
AND (SB.EXPIRATION_DATE IS NULL OR SB.EXPIRATION_DATE >= to_date('19000101000000','YYYYMMDDHH24MISS'))
AND (AR.EXPIRATION_DATE IS NULL OR AR.EXPIRATION_DATE >= to_date('19000101000000','YYYYMMDDHH24MISS'))
AND (AR.EFFECTIVE_DATE
SB.EXPIRATION_DATE IS NULL)
AND (AR.EXPIRATION_DATE IS NULL OR
AR.EXPIRATION_DATE > SB.EFFECTIVE_DATE)
AND SB.SUB_STATUS != 'T'
UNION ALL
SELECT DISTINCT 'K',
AR.RESOURCE_VALUE,
AR.RESOURCE_TYPE,
GREATEST(TO_CHAR(AR.EFFECTIVE_DATE, 'YYYYMMDDHH24MISS'),
TO_CHAR(SH.EFFECTIVE_DATE, 'YYYYMMDDHH24MISS')),
LEAST(NVL(TO_CHAR(AR.EXPIRATION_DATE, 'YYYYMMDDHH24MISS'),
'47001231000000'),
NVL(TO_CHAR(SH.EXPIRATION_DATE, 'YYYYMMDDHH24MISS'),
'47001231000000')),
AR.AGR_NO,
SH.MEDIUM_CUS_ID,
SH.SUB_STATUS,
SH.BUSINESS_ENTITY_ID,
SH.LANGUAGE,
SH.ROUTING_POLICY_ID,
SH.L9_PORT_IND,
SH.L9_SPLIT_PERIOD
FROM HUGE_RESOURCE AR,
MEDIUM_SUB_HISTORY SH,
(select /*+ RESULT_CACHE */
DISTINCT PARAM_NAME as PARAM_NAME
from SMALL_PARAM
where GUIDING_IND = 'Y') OP,
MEDIUM_CUS CS
WHERE AR.AGR_NO = 1056851
AND AR.AGREEMENT_KEY = MOD(1056851, 100)
AND (AR.RESOURCE_STATE != 'F' OR AR.RESOURCE_STATE IS NULL)
AND AR.RANGE_IND = 'N'
AND SH.MEDIUM_SUB_NO = AR.AGR_NO
AND EXISTS
(select /*+ INDEX(OP SMALL_PARAM_1IX) */
1
from SMALL_PARAM OP
where OP.PARAM_NAME in (AR.RESOURCE_PRM_CD, AR.BASE_PARAM_NAME))
AND SH.MEDIUM_CUS_ID = CS.MEDIUM_CUS_ID
AND (SH.EXPIRATION_DATE IS NULL OR SH.EXPIRATION_DATE >= to_date('19000101000000','YYYYMMDDHH24MISS'))
AND (AR.EXPIRATION_DATE IS NULL OR AR.EXPIRATION_DATE >= to_date('19000101000000','YYYYMMDDHH24MISS'))
AND (AR.EFFECTIVE_DATE
SH.EXPIRATION_DATE IS NULL)
AND (AR.EXPIRATION_DATE IS NULL OR
AR.EXPIRATION_DATE > SH.EFFECTIVE_DATE)
AND SH.SUB_STATUS NOT IN ('C', 'T')
试着在生产上抓了一个执行计划,又看了一下统计信息。
Statistics
----------------------------------------------------------
113 recursive calls
8 db block gets
31581 consistent gets
0 physical reads
0 redo size
1997 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
20 sorts (memory)
0 sorts (disk)
6 rows processed
Elapsed: 00:02:44.14
---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 13 | 2905 | 519 (59)| 00:00:07 | | |
在备份库上查看,结果同样的查询一下子没有了反应。执行了将近3分钟还是没有反应。备份库的资源要比生产差一些。好了感觉有问题了。开始看看这个sql语句的第一部分。
SELECT DISTINCT 'K',
AR.RESOURCE_VALUE,
AR.RESOURCE_TYPE,
GREATEST(TO_CHAR(AR.EFFECTIVE_DATE, 'YYYYMMDDHH24MISS'),
TO_CHAR(SB.EFFECTIVE_DATE, 'YYYYMMDDHH24MISS')),
LEAST(NVL(TO_CHAR(AR.EXPIRATION_DATE, 'YYYYMMDDHH24MISS'),
'47001231000000'),
NVL(TO_CHAR(SB.EXPIRATION_DATE, 'YYYYMMDDHH24MISS'),
'47001231000000')),
AR.AGR_NO,
SB.MEDIUM_CUS_ID,
SB.SUB_STATUS,
SB.BUSINESS_ENTITY_ID,
SB.LANGUAGE,
SB.ROUTING_POLICY_ID,
SB.L9_PORT_IND,
SB.L9_SPLIT_PERIOD
FROM HUGE_RESOURCE AR, --这个表有大约2000多万条数据,做了分区
MEDIUM_SUB SB, --这个表有50万左右的数据
(select /*+ RESULT_CACHE */
DISTINCT PARAM_NAME as PARAM_NAME
from SMALL_PARAM --这个表比较小,有不到2万条记录,如果加过滤条件,能过滤掉一半多的数据,因为那个字段不在索引字段里,所以加了result_cache
where GUIDING_IND = 'Y') OP,
MEDIUM_CUS CS
WHERE AR.AGR_NO = 1056851
AND AR.AGREEMENT_KEY = MOD(1056851, 100)
AND (AR.RESOURCE_STATE != 'F' OR AR.RESOURCE_STATE IS NULL)
AND AR.RANGE_IND = 'N'
AND SB.MEDIUM_SUB_NO = AR.AGR_NO
AND EXISTS
(select /*+ INDEX(OP SMALL_PARAM_1IX) */
1
from SMALL_PARAM OP
where OP.PARAM_NAME in (AR.RESOURCE_PRM_CD, AR.BASE_PARAM_NAME) )
AND SB.MEDIUM_CUS_ID = CS.MEDIUM_CUS_ID
AND (SB.EXPIRATION_DATE IS NULL OR SB.EXPIRATION_DATE >= to_date('19000101000000','YYYYMMDDHH24MISS'))
AND (AR.EXPIRATION_DATE IS NULL OR AR.EXPIRATION_DATE >= to_date('19000101000000','YYYYMMDDHH24MISS'))
AND (AR.EFFECTIVE_DATE
SB.EXPIRATION_DATE IS NULL)
AND (AR.EXPIRATION_DATE IS NULL OR
AR.EXPIRATION_DATE > SB.EFFECTIVE_DATE)
AND SB.SUB_STATUS != 'T'
因为结果集的输出中没有op这个表的列,而且在where子句中存在exists语句,在exists里面也没有做关联,那个同事坚持说想在做关联的时候把op的数据先做了result cache,在子查询中就能做关联了,避免重复的表扫描。听起来好像有道理,我觉得语句有问题,尽管说是产品部分提供的方案。op在from 后,但是和后面的流程都没有关联,但也没有做笛卡尔积。在他的强烈反对中我把以下的部分从from中删除。
(select /*+ RESULT_CACHE */
DISTINCT PARAM_NAME as PARAM_NAME
from SMALL_PARAM --这个表比较小,有不到2万条记录,如果加过滤条件,能过滤掉一半多的数据,因为那个字段不在索引字段里,所以加了result_cache
where GUIDING_IND = 'Y') OP,
然后在备份库上重新跑一次,没想到一下子就有了反应。存在一定的物理读,第二次运行就没有了。逻辑度有了近6倍的提升,执行时间在0.02-0.07之间。
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5854 consistent gets
605 physical reads
72 redo size
1900 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed
按照这个速度,0.07秒*80000次/60/60=1.55小时,剩下的事情就是和他们确认一些具体的细节了。下午晚些时候产品部分也确认这确实是个问题。