[20170601]distinct的优化.txt

[20170601]distinct的优化.txt

1.环境:
SCOTT@book> @ &r/ver1
PORT_STRING          VERSION    BANNER
-------------------- ---------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx  11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

create table t as select * from all_objects;
create index t_idx on t(owner,object_id);

--//分析表略.method_opt=>'for all columns size 1'.
 
select index_name,blevel,leaf_blocks,
    distinct_keys,num_rows,
    avg_leaf_blocks_per_key lf_per_key,
    avg_data_blocks_per_key blks_per_key
from user_indexes
where index_name ='T_IDX';

INDEX_NAME BLEVEL LEAF_BLOCKS DISTINCT_KEYS   NUM_ROWS LF_PER_KEY BLKS_PER_KEY
---------- ------ ----------- ------------- ---------- ---------- ------------
T_IDX           1         256         84774      84774          1            1

--//注意T表 owner is not null.
--//当我们执行select distinct owner from t;时正常情况下选择的执行计划如下:

alter session set statistics_level=all;

SCOTT@book> column owner format a30
SCOTT@book> select distinct owner from t;
OWNER
------------------------------
OWBSYS_AUDIT
MDSYS
CTXSYS
FLOWS_FILES
HR
OLAPSYS
OUTLN
OWBSYS
PUBLIC
APEX_030200
EXFSYS
ORACLE_OCM
SCOTT
SYSTEM
DBSNMP
OE
ORDPLUGINS
ORDSYS
PM
SH
SYSMAN
APPQOSSYS
BI
IX
ORDDATA
XDB
SI_INFORMTN_SCHEMA
SYS
WMSYS
29 rows selected.

SCOTT@book> @ &r/dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  g3ywa5u5raj7d, child number 0
-------------------------------------
select distinct owner from t
Plan hash value: 1741570181
----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name  | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |      1 |        |       |    74 (100)|          |     29 |00:00:00.04 |     263 |       |       |          |
|   1 |  HASH UNIQUE          |       |      1 |     29 |   174 |    74   (5)| 00:00:01 |     29 |00:00:00.04 |     263 |  5686K|  1858K| 2222K (0)|
|   2 |   INDEX FAST FULL SCAN| T_IDX |      1 |  84774 |   496K|    71   (0)| 00:00:01 |  84774 |00:00:00.01 |     263 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$1 / T@SEL$1

--//这样相当于把索引当作表做快速扫描,然后使用HASH UNIQUE过滤需要的结果.主要成本消耗就是扫描整个索引的成本.
--//而我们可以想象还有1个更佳的算法,就是通过递规每次查询owner的最小值,而查询最小值的逻辑读是很小的.

with ownerlist(x) as
(  
   select min(owner) from t
   union all
   select (select min(owner) from t where t.owner > ownerlist.x ) from ownerlist where x is not null
) select * from ownerlist where x is not null ;

--//注意一定要加一个条件where x is not null,不然变成死循环.后面也要加where x is not null ;,不然记录会多1条null值.

SCOTT@book> @ &r/dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  frvat47pq8f2x, child number 0
-------------------------------------
with ownerlist(x) as (    select min(owner) from t    union all
select (select min(owner) from t where t.owner > ownerlist.x ) from
ownerlist where x is not null ) select * from ownerlist where x is not
null

Plan hash value: 946542369
---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name  | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |       |      1 |        |       |     4 (100)|          |     29 |00:00:00.01 |      33 |
|*  1 |  VIEW                                     |       |      1 |      2 |    34 |     4   (0)| 00:00:01 |     29 |00:00:00.01 |      33 |
|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|       |      1 |        |       |            |          |     30 |00:00:00.01 |      33 |
|   3 |    SORT AGGREGATE                         |       |      1 |      1 |     6 |            |          |      1 |00:00:00.01 |       2 |
|   4 |     INDEX FULL SCAN (MIN/MAX)             | T_IDX |      1 |      1 |     6 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       2 |
|   5 |    SORT AGGREGATE                         |       |     29 |      1 |     6 |            |          |     29 |00:00:00.01 |      31 |
|   6 |     FIRST ROW                             |       |     29 |      1 |     6 |     2   (0)| 00:00:01 |     28 |00:00:00.01 |      31 |
|*  7 |      INDEX RANGE SCAN (MIN/MAX)           | T_IDX |     29 |      1 |     6 |     2   (0)| 00:00:01 |     28 |00:00:00.01 |      31 |
|   8 |    RECURSIVE WITH PUMP                    |       |     30 |        |       |            |          |     29 |00:00:00.01 |       0 |
---------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SET$1 / OWNERLIST@SEL$4
   2 - SET$1
   3 - SEL$1
   4 - SEL$1 / T@SEL$1
   5 - SEL$3
   7 - SEL$3 / T@SEL$3
   8 - SEL$2

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X" IS NOT NULL)
   7 - access("T"."OWNER">:B1)

--//执行计划看起来很复杂,实际上理解了逻辑还是很简单的.逻辑读=33,明显比前面少许多,比较适合重复值很多的distinct.

--//想要说明一点,实际上算法很重要,许多开发正是丢掉这些最基本的东西,写出来的sql语句就像中学生写的家庭作业.

时间: 2024-09-20 23:38:45

[20170601]distinct的优化.txt的相关文章

[20170104]一条sql优化.txt

[20170104]一条sql优化.txt --生产系统不明原因重启,看了1下,顺便看了前后的awr报表,发现一条语句,其实问题没什么,只不过这种现象在开发很普遍,做一点点记录. 1.环境: xxxxxx> @ &r/ver1 PORT_STRING                    VERSION        BANNER ------------------------------ -------------- -----------------------------------

[20151221]sql语句优化.txt

[20151221]sql语句优化.txt --自从发现开发乱用distinct以后,链接http://blog.itpub.net/267265/viewspace-1871989/ --我看sql语句特别注意连接多个表,但是显示仅仅一个表的情况,上个星期五,发现一条: sql_id=dpdk3xfd6cvky SELECT EMR_DJMX.ZSFL     FROM MS_YJ01, L_LIS_SQDMX, EMR_DJMX    WHERE     MS_YJ01.YJXH IN (

[20150715]一条sql语句的优化.txt

[20150715]一条sql语句的优化.txt --生产系统发现一条语句. update presc_check t set t.diagnosis=replace(t.diagnosis,',慢性病drugs','') --第1眼看到的感觉真的很想骂人,什么能没有where条件呢? --我把这个表拷贝过来.这个表占用1G多1点,在测试环境执行看看: -- copy from system/xxxx@ip:1521/tyt create presc_check using select * f

[20130319]一条sql语句的优化.txt

[20130319]一条sql语句的优化.txt 生产系统,遇到这样一条语句:SELECT MAX (LENGTH (pe_id)) FROM pe_master_index WHERE SUBSTR (pe_id, 1, 2) = 'TJ'; --真不知道开发人员如何想的,写出这样的语句.字段pe_id是主键.--数据库版本 SQL> select * from v$version where rownum BANNER                                     

[20120830]11G SPM的学习6.txt--第3方优化.txt

[20120830]11G SPM的学习6.txt--第3方优化.txt 继续前面的学习: SQL> select * from v$version ; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production PL/SQL R

[20131204]sql语句优化.txt

[20131204]sql语句优化.txt 昨天优化sql语句,遇到一些细节问题,做一个记录: SCOTT@test> @verBANNER--------------------------------------------------------------------------------Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production create table t (id numb

[20131025]一条sql语句的优化.txt

[20131025]一条sql语句的优化.txt 最近在优化一条sql语句,做一个测试例子测试看看.遇到一些问题记录一下: 1.建立环境: SCOTT@test> @ver BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Productio

[20120319]一条sql语句的优化.txt

前天检查数据库,发现一天sql语句执行如下:SELECT MAX (undrug_code)   FROM undrug_info  WHERE SUBSTR (undrug_code, 1, 1) = 'F'; undrug_code是表undrug_info的主键.开始看见这个语句,感觉这样写不好,我想像的执行计划,全扫描索引,然后sort aggregate,找到最大值. 不如这样写:SELECT MAX (undrug_code)   FROM undrug_info  WHERE un

PostgreSQL distinct 与 Greenplum distinct 的实现与优化

标签 PostgreSQL , distinct , 多distinct , groupagg , hashagg , sort , hyperloglog , 估值 背景 求distinct是业务的一个普遍需求,例如每天有多少用户,每个省份有多少用户,每天有多少类目的用户等. select date,count(dinstinct user) from tbl group by date; select date, province, count(distinct user) from tbl