[20130809]12c Clustering Factor.txt
以前在11G以前,如果使用assm,表的CF经常会很大,即使你插入的像顺序号这样的字段,由于多个会话同时操作,
插入的数据分布的不同的块中,以顺序号为索引的CF也会变得很大,甚至接近记录的数量。这个在《基于成本的优化》里面也有介绍。
但是在12g可以设置一个参数改善这种情况,做一些测试看看。 参考了Richard Foote大师的blog:
http://richardfoote.wordpress.com/2013/05/08/important-clustering-factor-calculation-improvement-fix-you/
1.测试环境:
SQL> @ver
BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
SQL> create table bowie (id number, name varchar2(30));
Table created.
SQL> create sequence bowie_seq order;
Sequence created.
CREATE OR REPLACE PROCEDURE bowie_proc AS
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO bowie VALUES (bowie_seq.NEXTVAL, 'ZIGGY STARDUST');
COMMIT;
END LOOP;
END;
/
select table_name, i.tablespace_name, segment_space_managementement
from dba_tables i, dba_tablespaces t where i.tablespace_name = t.tablespace_name and table_name='BOWIE';
TABLE_NAME TABLESPACE_NAME SEGMEN
---------- ------------------------------ ------
BOWIE USERS AUTO
2.建立过程,调用job,注意bowie_proc;后面有分号。
CREATE OR REPLACE PROCEDURE do_bowie_proc
AS
v_jobno NUMBER := 0;
BEGIN
FOR i IN 1 .. 4
LOOP
DBMS_JOB.submit (v_jobno, 'bowie_proc;', SYSDATE);
END LOOP;
COMMIT;
END;
/
SQL> exec do_bowie_proc
PL/SQL procedure successfully completed.
SQL> SELECT COUNT (*) FROM dba_jobs_running WHERE ROWNUM
COUNT(*)
----------
0
--等待job结束。
SQL> select count(*) from bowie ;
COUNT(*)
----------
400000
3.建立索引:
SQL> create index bowie_id_i on bowie(id);
Index created.
SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');
PL/SQL procedure successfully completed.
SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';
TABLE_NAME INDEX_NAME BLOCKS NUM_ROWS CLUSTERING_FACTOR
---------- ------------------------------ ---------- ---------- -----------------
BOWIE BOWIE_ID_I 1630 400000 352954
--可以发现CLUSTERING_FACTOR=352954,非常接近NUM_ROWS=40000.按照这样讲数据非常离散。
SQL> set autot traceonly
SQL> select * from bowie where id between 42 and 540;
499 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 500 | 10000 | 445 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| BOWIE | 500 | 10000 | 445 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=42)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1425 consistent gets
0 physical reads
0 redo size
5480 bytes sent via SQL*Net to client
566 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
499 rows processed
-- 我们查询返回499条,占500/400000=0.00125,很少的一部分oracle选择的是全表扫描而非使用索引。
-- 注:我测试执行计划改变的边界select * from bowie where id between 42 and 539;使用索引。
-- 也许你可能情况有点不同,但是误差不会太大。
SQL> select * from bowie where id between 42 and 539;
498 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 499 | 9980 | 445 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE | 499 | 9980 | 445 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 499 | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID">=42 AND "ID"
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
435 consistent gets
0 physical reads
0 redo size
12925 bytes sent via SQL*Net to client
566 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
498 rows processed
--逻辑读435,明显比全表扫描少,全表扫描逻辑读1425.另外注意执行计划12c叫 TABLE ACCESS BY INDEX ROWID BATCHED,
--这里的BATCHED表示什么? 自己不是很清楚。
http://richardfoote.wordpress.com/2013/05/08/important-clustering-factor-calculation-improvement-fix-you/
Once applied (the following demo is on a patched 11.2.0.3 database), there is a new statistics collection preference
that can be defined, called TABLE_CACHED_BLOCKS. This basically sets the number of table blocks we can assume would
already be cached when performing an index scan and can be ignored when incrementing the CF during statistics gathering.
The default is 1 (i.e. as performed presently) but can be set up to be a value between 1 and 255, meaning during the
collection of index statistics, it will not increment the CF if the table block being referenced by the current index
entry has already been referenced by any of the prior 255 index entries (if set to 255). It basically sets the
appropriate parameter in the sys_op_countchg function used to calculate the CF value during statistic gathering to not
increment the CF if the current table block has already been accessed "x" index entries previously.
--我不翻译,避免错误!大概意思是可以定义一个参数TABLE_CACHED_BLOCKS,值是cache的数量,缺省是1.范围是1-255.
--文章的作者执行的过程是3次。而我是4次。
--我的感觉oracle应该采用一个百分比更合适一些,或者2者都支持,
--也许理解有误差,英文实在太差!!
column cascade format a30
column degree format a10
column estimate_percent format a30
column method_opt format a30
column no_invalidate format a30
column granularity format a10
column publish format a10
column INCREMENTAL format a10
column STALE_PERCENT format a10
column AUTOSTATS_TARGET format a10
column TABLE_CACHED_BLOCKS format a10
SELECT DBMS_STATS.get_param ('CASCADE') CASCADE,
DBMS_STATS.get_param ('DEGREE') DEGREE,
DBMS_STATS.get_param ('ESTIMATE_PERCENT') estimate_percent,
DBMS_STATS.get_param ('METHOD_OPT') method_opt,
DBMS_STATS.get_param ('NO_INVALIDATE') no_invalidate,
DBMS_STATS.get_param ('GRANULARITY') granularity,
DBMS_STATS.get_param ('PUBLISH') publish,
DBMS_STATS.get_param ('INCREMENTAL') incremental,
DBMS_STATS.get_param ('STALE_PERCENT') stale_percent,
DBMS_STATS.get_param ('AUTOSTATS_TARGET') autostats_target,
DBMS_STATS.get_param ('TABLE_CACHED_BLOCKS') TABLE_CACHED_BLOCKS
FROM DUAL;
CASCADE DEGREE ESTIMATE_PERCENT METHOD_OPT NO_INVALIDATE GRANULARIT PUBLISH INCREMENTA STALE_PERC AUTOSTATS_ TABLE_CACH
------------------------ ------- ---------------------------- -------------------------- --------------------------- ---------- ---------- ---------- ---------- ---------- ----------
DBMS_STATS.AUTO_CASCADE NULL DBMS_STATS.AUTO_SAMPLE_SIZE FOR ALL COLUMNS SIZE AUTO DBMS_STATS.AUTO_INVALIDATE AUTO TRUE FALSE 10 AUTO 1
--TABLE_CACHED_BLOCKS缺省=1.我的测试表bowie占用的块1630,估计要缓存255块才行,而不是作者的42.
--exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE', pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);
SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE', pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);
PL/SQL procedure successfully completed.
--EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ID_I', estimate_percent=> null);
SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');
PL/SQL procedure successfully completed.
SELECT t.table_name,
i.index_name,
t.blocks,
t.num_rows,
i.clustering_factor
FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name = 'BOWIE_ID_I';
TABLE_NAME INDEX_NAME BLOCKS NUM_ROWS CLUSTERING_FACTOR
---------- ------------------------------ ---------- ---------- -----------------
BOWIE BOWIE_ID_I 1630 400000 1383
--CLUSTERING_FACTOR=1383,结果比blocks还要小。说明数据很集中。
--btw,如果设置TABLE_CACHED_BLOCKS=42,结果也一样,不知道为什么。
--清除以前的执行计划。
SQL> alter system flush shared_pool;
System altered.
SQL> set autot traceonly;
SQL> select * from bowie where id between 42 and 540;
499 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 500 | 10000 | 6 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE | 500 | 10000 | 6 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 500 | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID">=42 AND "ID"
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
436 consistent gets
0 physical reads
0 redo size
12949 bytes sent via SQL*Net to client
566 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
499 rows processed
--取1%的记录看看。
SQL> select * from bowie where id between 42 and 40041;
40000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40001 | 781K| 231 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE | 40001 | 781K| 231 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 40001 | | 92 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID">=42 AND "ID"
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
35438 consistent gets
0 physical reads
0 redo size
1028834 bytes sent via SQL*Net to client
2732 bytes received via SQL*Net from client
201 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
40000 rows processed
--很明显这样CF设置的太小了。
4.处于好奇,测试不同的TABLE_CACHED_BLOCKS的情况:
我发现
TABLE_CACHED_BLOCKS=2,CF=275559
TABLE_CACHED_BLOCKS=3,CF=1561
--TABLE_CACHED_BLOCKS=3后下降也太快一点,不知道是否存在什么bug。
我以前一般是使用设置参数命令,根据业务并行的情况,使用原CF/并发执行的用户,一般是4-6.
例子:
EXECUTE SYS.DBMS_STATS.set_index_stats (OWNNAME=>user, INDNAME=>'BOWIE_ID_I', clstfct => 352954/4);
SQL> EXECUTE SYS.DBMS_STATS.set_index_stats (OWNNAME=>user, INDNAME=>'BOWIE_ID_I', clstfct => 352954/4);
PL/SQL procedure successfully completed.
SQL> alter system flush shared_pool;
System altered.
SQL> select * from bowie where id between 42 and 2025;
1984 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1985 | 39700 | 445 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE | 1985 | 39700 | 445 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1985 | | 7 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID">=42 AND "ID"
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1574 consistent gets
0 physical reads
0 redo size
49932 bytes sent via SQL*Net to client
642 bytes received via SQL*Net from client
11 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1984 rows processed
--感觉这样合理一些。