[20130809]12c Clustering Factor.txt

以前在11G以前，如果使用assm，表的CF经常会很大，即使你插入的像顺序号这样的字段，由于多个会话同时操作，
插入的数据分布的不同的块中，以顺序号为索引的CF也会变得很大，甚至接近记录的数量。这个在《基于成本的优化》里面也有介绍。

但是在12g可以设置一个参数改善这种情况，做一些测试看看。参考了Richard Foote大师的blog:
http://richardfoote.wordpress.com/2013/05/08/important-clustering-factor-calculation-improvement-fix-you/

1.测试环境:

SQL> @ver

BANNER

--------------------------------------------------------------------------------

Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production

SQL> create table bowie (id number, name varchar2(30));

Table created.

SQL> create sequence bowie_seq order;

Sequence created.

CREATE OR REPLACE PROCEDURE bowie_proc AS

BEGIN

   FOR i IN 1..100000 LOOP

       INSERT INTO bowie VALUES (bowie_seq.NEXTVAL, 'ZIGGY STARDUST');

       COMMIT;

   END LOOP;

END;

select table_name, i.tablespace_name, segment_space_managementement

from dba_tables i, dba_tablespaces t   where i.tablespace_name = t.tablespace_name and table_name='BOWIE';

TABLE_NAME TABLESPACE_NAME                SEGMEN

---------- ------------------------------ ------

BOWIE      USERS                          AUTO

2.建立过程，调用job，注意bowie_proc;后面有分号。

CREATE OR REPLACE PROCEDURE do_bowie_proc

AS

   v_jobno   NUMBER := 0;

BEGIN

   FOR i IN 1 .. 4

   LOOP

      DBMS_JOB.submit (v_jobno, 'bowie_proc;', SYSDATE);

   END LOOP;

   COMMIT;

END;

SQL> exec do_bowie_proc

PL/SQL procedure successfully completed.

SQL> SELECT COUNT (*)  FROM dba_jobs_running WHERE ROWNUM

  COUNT(*)

----------

--等待job结束。

SQL> select count(*) from bowie ;

  COUNT(*)

----------

3.建立索引:
SQL> create index bowie_id_i on bowie(id);
Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i

WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';

TABLE_NAME INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR

---------- ------------------------------ ---------- ---------- -----------------

BOWIE      BOWIE_ID_I                           1630     400000            352954

--可以发现CLUSTERING_FACTOR=352954,非常接近NUM_ROWS=40000.按照这样讲数据非常离散。

SQL> set autot traceonly

SQL> select * from bowie where id between 42 and 540;

499 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1845943507

---------------------------------------------------------------------------

| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |       |   500 | 10000 |   445   (1)| 00:00:01 |

|*  1 |  TABLE ACCESS FULL| BOWIE |   500 | 10000 |   445   (1)| 00:00:01 |

---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   1 - filter("ID"=42)

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

       1425  consistent gets

          0  physical reads

          0  redo size

       5480  bytes sent via SQL*Net to client

        566  bytes received via SQL*Net from client

          4  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

        499  rows processed

-- 我们查询返回499条，占500/400000=0.00125,很少的一部分oracle选择的是全表扫描而非使用索引。
-- 注：我测试执行计划改变的边界select * from bowie where id between 42 and 539;使用索引。
-- 也许你可能情况有点不同，但是误差不会太大。

SQL> select * from bowie where id between 42 and 539;

498 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1405654398

--------------------------------------------------------------------------------------------------

| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                    |            |   499 |  9980 |   445   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |   499 |  9980 |   445   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |   499 |       |     4   (0)| 00:00:01 |

--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=42 AND "ID"

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

        435  consistent gets

          0  physical reads

          0  redo size

      12925  bytes sent via SQL*Net to client

        566  bytes received via SQL*Net from client

          4  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

        498  rows processed

--逻辑读435，明显比全表扫描少，全表扫描逻辑读1425.另外注意执行计划12c叫 TABLE ACCESS BY INDEX ROWID BATCHED,
--这里的BATCHED表示什么? 自己不是很清楚。

http://richardfoote.wordpress.com/2013/05/08/important-clustering-factor-calculation-improvement-fix-you/

    Once applied (the following demo is on a patched 11.2.0.3 database), there is a new statistics collection preference

that can be defined, called TABLE_CACHED_BLOCKS. This basically sets the number of table blocks we can assume would

already be cached when performing an index scan and can be ignored when incrementing the CF during statistics gathering.

The default is 1 (i.e. as performed presently) but can be set up to be a value between 1 and 255, meaning during the

collection of index statistics, it will not increment the CF if the table block being referenced by the current index

entry has already been referenced by any of the prior 255 index entries (if set to 255). It basically sets the

appropriate parameter in the sys_op_countchg function used to calculate the CF value during statistic gathering to not

increment the CF if the current table block has already been accessed "x" index entries previously.

--我不翻译，避免错误！大概意思是可以定义一个参数TABLE_CACHED_BLOCKS,值是cache的数量，缺省是1.范围是1-255.
--文章的作者执行的过程是3次。而我是4次。
--我的感觉oracle应该采用一个百分比更合适一些，或者2者都支持，
--也许理解有误差，英文实在太差!!

column cascade format a30
column degree format a10
column estimate_percent format a30
column method_opt format a30
column no_invalidate format a30
column granularity format a10
column publish format a10
column INCREMENTAL format a10
column STALE_PERCENT format a10
column AUTOSTATS_TARGET format a10
column TABLE_CACHED_BLOCKS format a10

SELECT DBMS_STATS.get_param ('CASCADE') CASCADE,
       DBMS_STATS.get_param ('DEGREE') DEGREE,
       DBMS_STATS.get_param ('ESTIMATE_PERCENT') estimate_percent,
       DBMS_STATS.get_param ('METHOD_OPT') method_opt,
       DBMS_STATS.get_param ('NO_INVALIDATE') no_invalidate,
       DBMS_STATS.get_param ('GRANULARITY') granularity,
       DBMS_STATS.get_param ('PUBLISH') publish,
       DBMS_STATS.get_param ('INCREMENTAL') incremental,
       DBMS_STATS.get_param ('STALE_PERCENT') stale_percent,
       DBMS_STATS.get_param ('AUTOSTATS_TARGET') autostats_target,
       DBMS_STATS.get_param ('TABLE_CACHED_BLOCKS') TABLE_CACHED_BLOCKS
FROM DUAL;

CASCADE                  DEGREE  ESTIMATE_PERCENT             METHOD_OPT                 NO_INVALIDATE               GRANULARIT PUBLISH    INCREMENTA STALE_PERC AUTOSTATS_ TABLE_CACH

------------------------ ------- ---------------------------- -------------------------- --------------------------- ---------- ---------- ---------- ---------- ---------- ----------

DBMS_STATS.AUTO_CASCADE  NULL    DBMS_STATS.AUTO_SAMPLE_SIZE  FOR ALL COLUMNS SIZE AUTO  DBMS_STATS.AUTO_INVALIDATE  AUTO       TRUE       FALSE      10         AUTO       1

--TABLE_CACHED_BLOCKS缺省=1.我的测试表bowie占用的块1630,估计要缓存255块才行，而不是作者的42.
--exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE', pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE', pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.

--EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ID_I', estimate_percent=> null);

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SELECT t.table_name,

       i.index_name,

       t.blocks,

       t.num_rows,

       i.clustering_factor

  FROM user_tables t, user_indexes i

 WHERE t.table_name = i.table_name AND i.index_name = 'BOWIE_ID_I';

TABLE_NAME INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR

---------- ------------------------------ ---------- ---------- -----------------

BOWIE      BOWIE_ID_I                           1630     400000              1383

--CLUSTERING_FACTOR=1383,结果比blocks还要小。说明数据很集中。
--btw，如果设置TABLE_CACHED_BLOCKS=42,结果也一样，不知道为什么。

--清除以前的执行计划。

SQL> alter system flush shared_pool;

System altered.

SQL> set autot traceonly;

SQL> select * from bowie where id between 42 and 540;

499 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1405654398

--------------------------------------------------------------------------------------------------

| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                    |            |   500 | 10000 |     6   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |   500 | 10000 |     6   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |   500 |       |     4   (0)| 00:00:01 |

--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=42 AND "ID"

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

        436  consistent gets

          0  physical reads

          0  redo size

      12949  bytes sent via SQL*Net to client

        566  bytes received via SQL*Net from client

          4  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

        499  rows processed

--取1%的记录看看。

SQL> select * from bowie where id between 42 and 40041;

40000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1405654398

--------------------------------------------------------------------------------------------------

| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                    |            | 40001 |   781K|   231   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      | 40001 |   781K|   231   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I | 40001 |       |    92   (0)| 00:00:01 |

--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=42 AND "ID"

Statistics

----------------------------------------------------------

          1  recursive calls

          0  db block gets

      35438  consistent gets

          0  physical reads

          0  redo size

    1028834  bytes sent via SQL*Net to client

       2732  bytes received via SQL*Net from client

        201  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

      40000  rows processed

--很明显这样CF设置的太小了。

4.处于好奇，测试不同的TABLE_CACHED_BLOCKS的情况：
我发现

TABLE_CACHED_BLOCKS＝2，CF=275559

TABLE_CACHED_BLOCKS＝3，CF=1561

--TABLE_CACHED_BLOCKS＝3后下降也太快一点，不知道是否存在什么bug。

我以前一般是使用设置参数命令，根据业务并行的情况，使用原CF/并发执行的用户，一般是4－6.
例子：
EXECUTE SYS.DBMS_STATS.set_index_stats (OWNNAME=>user, INDNAME=>'BOWIE_ID_I', clstfct => 352954/4);

SQL> EXECUTE SYS.DBMS_STATS.set_index_stats (OWNNAME=>user, INDNAME=>'BOWIE_ID_I', clstfct => 352954/4);

PL/SQL procedure successfully completed.

SQL> alter system flush shared_pool;

System altered.

SQL> select * from bowie where id between 42 and 2025;

1984 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1405654398

--------------------------------------------------------------------------------------------------

| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                    |            |  1985 | 39700 |   445   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |  1985 | 39700 |   445   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |  1985 |       |     7   (0)| 00:00:01 |

--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=42 AND "ID"

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

       1574  consistent gets

          0  physical reads

          0  redo size

      49932  bytes sent via SQL*Net to client

        642  bytes received via SQL*Net from client

         11  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

       1984  rows processed

--感觉这样合理一些。

时间： 2024-09-29 15:26:36

[20130809]12c Clustering Factor.txt

[20130809]12c Clustering Factor.txt的相关文章

[20130809]12c Clustering Factor(2).txt

[20170625]12c Extended statistics.txt

[20160910]12c sqlldr express.txt

[2016026]12c lateral语法.txt

[20140813关于Hakan factor.txt

[20130721]ORACLE 12C Invisible Columns.txt

[20131221]12c 优化 bug.txt

Oracle聚簇因子(Clustering factor，CF)

一个执行计划异常变更的案例 - 外传之聚簇因子(Clustering Factor)