[20130809]12c Clustering Factor.txt

[20130809]12c Clustering Factor.txt

以前在11G以前,如果使用assm,表的CF经常会很大,即使你插入的像顺序号这样的字段,由于多个会话同时操作,
插入的数据分布的不同的块中,以顺序号为索引的CF也会变得很大,甚至接近记录的数量。这个在《基于成本的优化》里面也有介绍。

但是在12g可以设置一个参数改善这种情况,做一些测试看看。 参考了Richard Foote大师的blog:
http://richardfoote.wordpress.com/2013/05/08/important-clustering-factor-calculation-improvement-fix-you/

1.测试环境:

SQL> @ver
BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
SQL> create table bowie (id number, name varchar2(30));
Table created.
SQL> create sequence bowie_seq order;
Sequence created.
CREATE OR REPLACE PROCEDURE bowie_proc AS
BEGIN
   FOR i IN 1..100000 LOOP
       INSERT INTO bowie VALUES (bowie_seq.NEXTVAL, 'ZIGGY STARDUST');
       COMMIT;
   END LOOP;
END;
/

select table_name, i.tablespace_name, segment_space_managementement
from dba_tables i, dba_tablespaces t   where i.tablespace_name = t.tablespace_name and table_name='BOWIE';
TABLE_NAME TABLESPACE_NAME                SEGMEN
---------- ------------------------------ ------
BOWIE      USERS                          AUTO

2.建立过程,调用job,注意bowie_proc;后面有分号。

CREATE OR REPLACE PROCEDURE do_bowie_proc
AS
   v_jobno   NUMBER := 0;
BEGIN
   FOR i IN 1 .. 4
   LOOP
      DBMS_JOB.submit (v_jobno, 'bowie_proc;', SYSDATE);
   END LOOP;
   COMMIT;
END;
/
SQL> exec do_bowie_proc
PL/SQL procedure successfully completed.
SQL> SELECT COUNT (*)  FROM dba_jobs_running WHERE ROWNUM 
  COUNT(*)
----------
         0

--等待job结束。

SQL> select count(*) from bowie ;
  COUNT(*)
----------
    400000

3.建立索引:
SQL> create index bowie_id_i on bowie(id);
Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');
PL/SQL procedure successfully completed.
SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';
TABLE_NAME INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
---------- ------------------------------ ---------- ---------- -----------------
BOWIE      BOWIE_ID_I                           1630     400000            352954

--可以发现CLUSTERING_FACTOR=352954,非常接近NUM_ROWS=40000.按照这样讲数据非常离散。

SQL> set autot traceonly
SQL> select * from bowie where id between 42 and 540;
499 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |   500 | 10000 |   445   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |   500 | 10000 |   445   (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1425  consistent gets
          0  physical reads
          0  redo size
       5480  bytes sent via SQL*Net to client
        566  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        499  rows processed

-- 我们查询返回499条,占500/400000=0.00125,很少的一部分oracle选择的是全表扫描而非使用索引。
-- 注:我测试执行计划改变的边界select * from bowie where id between 42 and 539;使用索引。
-- 也许你可能情况有点不同,但是误差不会太大。

SQL> select * from bowie where id between 42 and 539;
498 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398
--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |   499 |  9980 |   445   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |   499 |  9980 |   445   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |   499 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">=42 AND "ID"
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        435  consistent gets
          0  physical reads
          0  redo size
      12925  bytes sent via SQL*Net to client
        566  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        498  rows processed

--逻辑读435,明显比全表扫描少,全表扫描逻辑读1425.另外注意执行计划12c叫 TABLE ACCESS BY INDEX ROWID BATCHED,
--这里的BATCHED表示什么? 自己不是很清楚。

http://richardfoote.wordpress.com/2013/05/08/important-clustering-factor-calculation-improvement-fix-you/
    Once applied (the following demo is on a patched 11.2.0.3 database), there is a new statistics collection preference
that can be defined, called TABLE_CACHED_BLOCKS. This basically sets the number of table blocks we can assume would
already be cached when performing an index scan and can be ignored when incrementing the CF during statistics gathering.
The default is 1 (i.e. as performed presently) but can be set up to be a value between 1 and 255, meaning during the
collection of index statistics, it will not increment the CF if the table block being referenced by the current index
entry has already been referenced by any of the prior 255 index entries (if set to 255). It basically sets the
appropriate parameter in the sys_op_countchg function used to calculate the CF value during statistic gathering to not
increment the CF if the current table block has already been accessed "x" index entries previously.

--我不翻译,避免错误!大概意思是可以定义一个参数TABLE_CACHED_BLOCKS,值是cache的数量,缺省是1.范围是1-255.
--文章的作者执行的过程是3次。而我是4次。
--我的感觉oracle应该采用一个百分比更合适一些,或者2者都支持,
--也许理解有误差,英文实在太差!!

column cascade format a30
column degree format a10
column estimate_percent format a30
column method_opt format a30
column no_invalidate format a30
column granularity format a10
column publish format a10
column INCREMENTAL format a10
column STALE_PERCENT format a10
column AUTOSTATS_TARGET format a10
column TABLE_CACHED_BLOCKS format a10

SELECT DBMS_STATS.get_param ('CASCADE') CASCADE,
       DBMS_STATS.get_param ('DEGREE') DEGREE,
       DBMS_STATS.get_param ('ESTIMATE_PERCENT') estimate_percent,
       DBMS_STATS.get_param ('METHOD_OPT') method_opt,
       DBMS_STATS.get_param ('NO_INVALIDATE') no_invalidate,
       DBMS_STATS.get_param ('GRANULARITY') granularity,
       DBMS_STATS.get_param ('PUBLISH') publish,
       DBMS_STATS.get_param ('INCREMENTAL') incremental,
       DBMS_STATS.get_param ('STALE_PERCENT') stale_percent,
       DBMS_STATS.get_param ('AUTOSTATS_TARGET') autostats_target,
       DBMS_STATS.get_param ('TABLE_CACHED_BLOCKS') TABLE_CACHED_BLOCKS
  FROM DUAL;

CASCADE                  DEGREE  ESTIMATE_PERCENT             METHOD_OPT                 NO_INVALIDATE               GRANULARIT PUBLISH    INCREMENTA STALE_PERC AUTOSTATS_ TABLE_CACH
------------------------ ------- ---------------------------- -------------------------- --------------------------- ---------- ---------- ---------- ---------- ---------- ----------
DBMS_STATS.AUTO_CASCADE  NULL    DBMS_STATS.AUTO_SAMPLE_SIZE  FOR ALL COLUMNS SIZE AUTO  DBMS_STATS.AUTO_INVALIDATE  AUTO       TRUE       FALSE      10         AUTO       1

--TABLE_CACHED_BLOCKS缺省=1.我的测试表bowie占用的块1630,估计要缓存255块才行,而不是作者的42.
--exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE', pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE', pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);
PL/SQL procedure successfully completed.
--EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ID_I', estimate_percent=> null);
SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');
PL/SQL procedure successfully completed.
SELECT t.table_name,
       i.index_name,
       t.blocks,
       t.num_rows,
       i.clustering_factor
  FROM user_tables t, user_indexes i
 WHERE t.table_name = i.table_name AND i.index_name = 'BOWIE_ID_I';
TABLE_NAME INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
---------- ------------------------------ ---------- ---------- -----------------
BOWIE      BOWIE_ID_I                           1630     400000              1383

--CLUSTERING_FACTOR=1383,结果比blocks还要小。说明数据很集中。
--btw,如果设置TABLE_CACHED_BLOCKS=42,结果也一样,不知道为什么。

--清除以前的执行计划。

SQL> alter system flush shared_pool;
System altered.
SQL> set autot traceonly;
SQL> select * from bowie where id between 42 and 540;
499 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398
--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |   500 | 10000 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |   500 | 10000 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |   500 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">=42 AND "ID"
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        436  consistent gets
          0  physical reads
          0  redo size
      12949  bytes sent via SQL*Net to client
        566  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        499  rows processed
--取1%的记录看看。
SQL> select * from bowie where id between 42 and 40041;
40000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398
--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            | 40001 |   781K|   231   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      | 40001 |   781K|   231   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I | 40001 |       |    92   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">=42 AND "ID"
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      35438  consistent gets
          0  physical reads
          0  redo size
    1028834  bytes sent via SQL*Net to client
       2732  bytes received via SQL*Net from client
        201  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      40000  rows processed

--很明显这样CF设置的太小了。

4.处于好奇,测试不同的TABLE_CACHED_BLOCKS的情况:
我发现

TABLE_CACHED_BLOCKS=2,CF=275559
TABLE_CACHED_BLOCKS=3,CF=1561

--TABLE_CACHED_BLOCKS=3后下降也太快一点,不知道是否存在什么bug。

我以前一般是使用设置参数命令,根据业务并行的情况,使用原CF/并发执行的用户,一般是4-6.
例子:
EXECUTE SYS.DBMS_STATS.set_index_stats (OWNNAME=>user, INDNAME=>'BOWIE_ID_I', clstfct => 352954/4);

SQL> EXECUTE SYS.DBMS_STATS.set_index_stats (OWNNAME=>user, INDNAME=>'BOWIE_ID_I', clstfct => 352954/4);
PL/SQL procedure successfully completed.
SQL> alter system flush shared_pool;
System altered.
SQL> select * from bowie where id between 42 and 2025;
1984 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398
--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |  1985 | 39700 |   445   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |  1985 | 39700 |   445   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |  1985 |       |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">=42 AND "ID"
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1574  consistent gets
          0  physical reads
          0  redo size
      49932  bytes sent via SQL*Net to client
        642  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1984  rows processed

--感觉这样合理一些。

时间: 2024-07-29 21:39:59

[20130809]12c Clustering Factor.txt的相关文章

[20130809]12c Clustering Factor(2).txt

[20130809]12c Clustering Factor(2).txt 参考以下链接,自己重复测试: http://richardfoote.wordpress.com/2013/05/14/clustering-factor-calculation-improvement-part-ii-blocks-on-blocks/ 1.测试环境: SQL> @ver BANNER                                                           

[20170625]12c Extended statistics.txt

[20170625]12c Extended statistics.txt --//别人的系统12c,awr报表出现大量调用执行如下sql语句. select default$ from col$ where rowid=:1; --//google看了一下,问题出在Extended statistics的问题,12c 会自动收集扩展统计信息.找到如下链接: --//https://blog.dbi-services.com/sql-plan-directives-strike-again/ 1

[20160910]12c sqlldr express.txt

[20160910]12c sqlldr express.txt --Oracle 12c introduces Sql*Loader Express features, which allow users to run sqlldr with minimum configuration.   --通过例子来说明: SCOTT@test01p> @ ver1 PORT_STRING                    VERSION        BANNER                 

[2016026]12c lateral语法.txt

[2016026]12c lateral语法.txt 12c 支持一种LATERAL的写法,例子: SELECT e1.*, e3.avg_sal   FROM scott.emp e1       ,LATERAL (SELECT AVG (e2.sal) avg_sal                   FROM scott.emp e2                  WHERE e1.deptno != e2.deptno) e3; -- 显示emp每行,同时显示其它不是本部门的平均

[20140813关于Hakan factor.txt

[20140813关于Hakan factor.txt --曾经写过一篇通过执行ALTER TABLE MINIMIZE RECORDS_PER_BLOCK避免行迁移的blog. http://blog.itpub.net/267265/viewspace-763315/ --里面的Hakan factor主要用来唯一映射表上行在位图索引.可以表示表的单个数据块的最大行数. --记录在sys.tab$的spare1里面. --昨天我查看随机安装的example例子发现一个奇怪的情况.自己也做一些

[20130721]ORACLE 12C Invisible Columns.txt

[20130721]ORACLE 12C Invisible Columns.txt 在ORACLE 12C中有了Invisible Columns的概念,就是在表中真实的存在该列,但是通过设置Invisible可以实现不显示该列.自己测试看看. SQL> select banner from v$version ; BANNER -------------------------------------------------------------------------------- Or

[20131221]12c 优化 bug.txt

[20131221]12c 优化 bug.txt http://connormcdonald.wordpress.com/2013/12/20/the-challenge-of-optimization/ --仅仅自己重复测试看看! @ver BANNER                                                                               CON_ID ------------------------------------

Oracle聚簇因子(Clustering factor,CF)

Oracle 聚簇因子(Clustering factor)   聚簇因子是 Oracle 统计信息中在CBO优化器模式下用于计算cost的参数之一,决定了当前的SQL语句是否走索引,还是全表扫描以及是否作为嵌套连接外部表等.如此这般,那到底什么是聚簇因子,那些情况下会影响到聚簇因子,以及如何提高聚簇因子?本文将对此展开描述.   1.堆表的存储方式    Oralce 数据库系统中最普通,最为常用的即为堆表.     堆表的数据存储方式为无序存储,也就是任意的DML操作都可能使得当前数据块存在

一个执行计划异常变更的案例 - 外传之聚簇因子(Clustering Factor)

之前的几篇文章: <一个执行计划异常变更的案例 - 前传> <一个执行计划异常变更的案例 - 外传之绑定变量窥探> <一个执行计划异常变更的案例 - 外传之查看绑定变量值的几种方法> <一个执行计划异常变更的案例 - 外传之rolling invalidation> 这个案例中涉及到了聚簇因子,所以本篇文章是这个系列的又一篇外传,写过上面几篇后,感觉现在就像打怪,见着真正的大BOSS之前,要经历各种小怪的骚扰,之所以写着几篇文章,真是因为这个案例涉及了很多知