有索引却走全表扫描的实验分析

最近在查询某范围的数据时，返回结果仅占整个表数据的0.05%，在相应字段上有对应索引，然而这部分数据并没有走索引，
将根据如下实验证明此种情况的原因
一构建环境
1建立测试表有两个字段，主键ID和手机号，手机号上有B树索引
drop table TEST_HIS
-- Create table
create table TEST_HIS
(
id    number,
phone varchar2(32)
)
;
-- Add comments to the table
comment on table TEST_HIS
is '测试直方图';
-- Add comments to the columns
comment on column TEST_HIS.phone
is '手机号，用于测试上面的直方图是否影响走索引';
2 建立主键和索引
-- Create/Recreate indexes
create index IDX_PHONE on TEST_HIS (phone);
-- Create/Recreate primary, unique and foreign key constraints
alter table TEST_HIS
add constraint PK_ID primary key (ID);
3 构造测试数据，根据如下脚本，可以手工修改循环次数，向表中插入13开头的电话号码11000个，15开头的电话号码10个，
以及18开头的电话号码10个，null的空号码11010个
declare
v_phone varchar2(32);
begin
v_phone :='136818978';
for i in 1..10000
loop
insert into test_his
    (id, phone)
values
    (SEQ_TEST_HIST.NEXTVAL, v_phone||i);
    --(SEQ_TEST_HIST.NEXTVAL,null);
end loop;
end;
4 收集统计信息
SELECT t.NUM_ROWS,t.AVG_ROW_LEN,t.BLOCKS,t.TABLE_NAME,T.LAST_ANALYZED,t.EMPTY_BLOCKS FROM USER_TABLES T WHERE TABLE_NAME ='TEST_HIS';

begin
dbms_stats.gather_table_stats(ownname => 'TEST',tabname => 'TEST_HIS',cascade => TRUE);
end;

SELECT t.NUM_ROWS,t.AVG_ROW_LEN,t.BLOCKS,t.TABLE_NAME,T.LAST_ANALYZED,t.EMPTY_BLOCKS FROM USER_TABLES T WHERE TABLE_NAME ='TEST_HIS';

二查找13开头和15开头的电话号码的sql执行计划

select *　from test.test_his H where h.phone like '13%';
执行计划
----------------------------------------------------------
Plan hash value: 2828532586

select *　from test.test_his H where h.phone like '15%';
执行计划
----------------------------------------------------------
Plan hash value: 2828532586

------------------------------------------------------------------------------
| Id | Operation         | Name     | Rows | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |          | 2227 | 22270 |    14   (8)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_HIS | 2227 | 22270 |    14   (8)| 00:00:01 |
------------------------------------------------------------------------------
根据执行计划我们看到查询15开头的数据返回的rows是2227，这和实际数据相差较大，于是打开10053查看相关信息
三查看10053
alter system flush buffer_cache;
ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';
explain plan for select *　from test.test_his where phone like '15%';
alter session set events '10053 trace name context off';
观察10053中针对表的访问路径信息
SINGLE TABLE ACCESS PATH
Column (#2): PHONE(VARCHAR2)
    AvgLen: 6.00 NDV: 10867 Nulls: 11107 Density: 9.2022e-005
Table: TEST_HIS Alias: TEST_HIS
    Card: Original: 22235 Rounded: 2227 Computed: 2226.62 Non Adjusted: 2226.62
Access Path: TableScan
    Cost: 14.09 Resp: 14.09 Degree: 0
      Cost_io: 13.00 Cost_cpu: 6359522
      Resp_io: 13.00 Resp_cpu: 6359522
kkofmx: index filter:"TEST_HIS"."PHONE" LIKE '15%'
Access Path: index (RangeScan)
    Index: IDX_PHONE
    resc_io: 604.00 resc_cpu: 5125340
    ix_sel: 0.20203 ix_sel_with_filters: 0.20203
    Cost: 604.88 Resp: 604.88 Degree: 1
Best:: AccessPath: TableScan
         Cost: 14.09 Degree: 1 Resp: 14.09 Card: 2226.62 Bytes: 0
TEST_HIS中 NDV: 高达10867个，然而sql中要查询的是以15开头的电话号码，15开头，13开头，18开头的数据只有三种
因此统计信息和实际sql要查询的语义也不相符，因此虽然phone上有索引，但是sql的查询条件是查询某号码开头的数据时，已有索引
就不能准确描述15开头的数据分布情况，因此即便15开头的数据只有10个，优化器也会按照普通索引上的统计信息NDV的值计算
返回rows，而这个返回rows远超过实际数据的分布情况，因此建立直方图收集正确的分布信息
四建立直方图
analyze table test.test_his compute statistics for columns phone;

alter system flush buffer_cache;
ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';
explain plan for select *　from test.test_his where phone like '15%';
alter session set events '10053 trace name context off';

SINGLE TABLE ACCESS PATH
Column (#2): PHONE(VARCHAR2)
    AvgLen: 5.00 NDV: 10021 Nulls: 11010 Density: 1.0720e-004
    Histogram: HtBal #Bkts: 75 UncompBkts: 75 EndPtVals: 76
Table: TEST_HIS Alias: TEST_HIS
    Card: Original: 22235 Rounded: 30 Computed: 29.93 Non Adjusted: 29.93
Access Path: TableScan
    Cost: 14.09 Resp: 14.09 Degree: 0
      Cost_io: 13.00 Cost_cpu: 6359522
      Resp_io: 13.00 Resp_cpu: 6359522
kkofmx: index filter:"TEST_HIS"."PHONE" LIKE '15%'
Access Path: index (RangeScan)
    Index: IDX_PHONE
    resc_io: 10.00 resc_cpu: 82314
    ix_sel: 0.002716 ix_sel_with_filters: 0.002716
    Cost: 10.01 Resp: 10.01 Degree: 1
******** Begin index join costing ********
****** trying bitmap/domain indexes ******
Access Path: index (IndexOnly)
    Index: IDX_PHONE
    resc_io: 2.00 resc_cpu: 20243
    ix_sel: 0.002716 ix_sel_with_filters: 0.002716
    Cost: 2.00 Resp: 2.00 Degree: 0
    SORT resource      Sort statistics
      Sort width:         477 Area size:      417792 Max Area size:    83676160
      Degree:               1
      Blocks to Sort:       1 Row size:           21 Total Rows:             30
      Initial runs:         1 Merge passes:        0 IO Cost / pass:          0
      Total IO sort cost: 0      Total CPU sort cost: 5826332
      Total Temp space used: 0
Access Path: index (FullScan)
    Index: PK_ID
    resc_io: 43.00 resc_cpu: 4712422
    ix_sel: 1 ix_sel_with_filters: 1
    Cost: 43.81 Resp: 43.81 Degree: 0
******** Cost index join ********
Index join: Considering index join to index IDX_PHONE
Index join: Joining index PK_ID
Ix HA Join
Outer table:
    resc: 2.00 card 29.93 bytes: 15 deg: 1 resp: 2.00
Inner table:
    resc: 54.76 card: 22235.00 bytes: 15 deg: 1 resp: 54.76
    using dmeth: 2 #groups: 1
    Cost per ptn: 0.88 #ptns: 1
    hash_area: 0 (max=0)   Hash join: Resc: 57.65 Resp: 57.65 [multiMatchCost=0.00]
******** Index join cost ********
Cost: 57.65
******** End index join costing ********
Best:: AccessPath: IndexRange Index: IDX_PHONE
         Cost: 10.01 Degree: 1 Resp: 10.01 Card: 29.93 Bytes: 0

查看对应执行计划
---------

--------------------------------------------------------------------------------
---------

| 0 | SELECT STATEMENT | | 30 | 300 | 10 (0)| 0
0:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| TEST_HIS | 30 | 300 | 10 (0)| 0
0:00:01 |

|* 2 | INDEX RANGE SCAN | IDX_PHONE | 30 | | 2 (0)| 0
0:00:01 |

--------------------------------------------------------------------------------

时间： 2024-09-30 01:15:00

有索引却走全表扫描的实验分析

有索引却走全表扫描的实验分析的相关文章

一条全表扫描sql语句的分析

SQL SERVER中关于OR会导致索引扫描或全表扫描的浅析

@dbsnake-用合适的函数索引来避免看似无法避免的全表扫描

Oracle 全表扫描及其执行计划(full table scan)

使用索引快速全扫描(Index FFS)避免全表扫描的若干场景

MongoDB Primary 为何持续出现 oplog 全表扫描？

高水位线和全表扫描

SQL中WHERE变量IS NULL条件导致全表扫描问题的解决方法_MsSql

大幅提升MySQL中InnoDB的全表扫描速度的方法_Mysql