直方图 density(密度)解释

引用源:http://blog.csdn.net/cymm_liu/article/details/8846141

Matt -- Thanks for the question regarding "density", version 8.1.6

Submitted on 6-Mar-2002 11:50 UTC

Tom's latest followup | Bookmark | Bottom

Last updated 28-Nov-2005 7:40

You Asked

What does the density column in dba_tab_columns mean? 

and we said...

Here is a support note on the topic:

Article-ID:         <Note:43041.1>
Circulation:        PUBLISHED (LIMITED)
Folder:             server.Performance.SqlTuning
Topic:              Optimizer Terms
Title:              QTUNE: What is Density?
Document-Type:      BULLETIN
Impact:             MEDIUM
Skill-Level:        NOVICE
Updated-Date:       05-APR-2000 21:20:35
References:         <NOTE:50750.1> Shared-Refs:        

Content-Type:       TEXT/PLAIN
Keywords:           CBO; FEATURE;
Products:           5/RDBMS (V7.X to V8.X);
Platforms:          GENERIC;  

PURPOSE

To give an explanation of 'Density' as used by the CBO.

SCOPE & APPLICATION

For Oracle Support Analysts trying to understand how density is used by the
CBO when determining the best access path.

Density is a statistic used by the Cost Based Optimizer to give selectivity
estimates for columns where better information is unavailable (i.e. from
histograms etc.). 

You must gather statistics on your table to get density statistics. 

You can see density statistics by looking at the following views:

USER_TAB_COLUMNS
ALL_TAB_COLUMNS
DBA_TAB_COLUMNS

Density is a column statistic and provides selectivity estimates for
equi-join predicates (e.g. and A.COL1 = B.COL1) and equality predicates
(e.g. COL1 = 5).

The density is expressed as a decimal number between 0 and 1.
Values close to 1 indicate that this column is unselective
Values close to 0 indicate that this column is highly selective

The more selective a column, the less rows are likely to be returned by a
query referencing this column in its predicate list. 

The column selectivity is part of the equation used to decide on the best
path for a query to take to retrieve the data required in the most effective
manner and hence impacts the final cost value for the query.

Density is calculated as follows:

Pre 7.3
~~~~~~~

    Density = 1 / Number of distinct NON null values

The number of distinct NON-null values for a column (COL1) on table TABLE1
can be obtained as follows:

select distinct count(COL1)
from   TABLE1
where  COL1 is not null;

7.3+
~~~~

The Density calculation has been refined by the use of histograms. If
you have created histograms on your columns we can now use the histogram
information to give more accurate information. Otherwise the Density is
calculated as before. With histograms we can use information on
popular and non-popular values to determine the selectivity. 

A non-popular value is one that does not span multiple bucket end points.
A popular value is one that spans multiple end points.

(Refer to <Note:50750.1> for details on histograms)

For non-popular values the density is calculated as the number of non-popular
values divided by the total number of values. Formula:

    Density =  Number of non-popular values
               ----------------------------
                  total number of values

We only use the density statistic for non-popular values. 

Popular values calculate the selectivity of a particular column values by
using histograms as follows:

The Selectivity for popular values is calculated as the number of end points
spanned by that value divided by the total number of end points. Formula:

    Selectivity = Number of end points spanned by this value
                  ------------------------------------------
                         total number of end points

           [Top of Page] 

Reviews

   November 21, 2002 - 8am UTC

Bookmark | Bottom | Top

Reviewer: Stephan Bressler from germany

Hi,

what's the formula
Density =  Number of non-popular values
               ----------------------------
                  total number of values

In case there are no popular values, is the density = 1?

I thought there would be a density per bucket? Wouldn't that make sense?    

 Selectivity with bind variables (Oracle 9.2.0.5)   November 26, 2005 - 1am
UTC

Bookmark | Bottom | Top

Reviewer: Vladimir Sadilovskiy from MA, US

Tom,

Could you look into this test case:

SQL> create table t1 as select 1 f from all_objects;

Table created.

SQL> create index i_t1 on t1(f);

Index created.

SQL> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size
254',estimate_percent=>100, cascade=>true);

PL/SQL procedure successfully completed.

SQL> variable f number;
SQL> exec :f := 1;

PL/SQL procedure successfully completed.

SQL> set autotrace on
<b> with constants the plan is nice</b>
SQL> select count(*) from t1 where f=2;

  COUNT(*)
----------
         0

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
   1    0   SORT (AGGREGATE)
   2    1     INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=1
           Bytes=3)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        490  bytes sent via SQL*Net to client
        655  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select count(*) from t1 where f=1;

  COUNT(*)
----------
     45765

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=3)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'T1' (Cost=8 Card=45765 Bytes=137
          295)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         73  consistent gets
          0  physical reads
          0  redo size
        493  bytes sent via SQL*Net to client
        655  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

<b>but with bind variable the plan is different</b>
SQL> select count(*) from t1 where f=:f;

  COUNT(*)
----------
     45765

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
   1    0   SORT (AGGREGATE)
   2    1     INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=4
          58 Bytes=1374)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         73  consistent gets
          0  physical reads
          0  redo size
        493  bytes sent via SQL*Net to client
        655  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

<b>1/NDV doesn't seem to match calculated density </b>
SQL> select object_id from dba_objects where object_name='T1';

 OBJECT_ID
----------
     56142

SQL> set pages 999
SQL> select * from hist_head$ where obj#=56142;

      OBJ#       COL# BUCKET_CNT    ROW_CNT  CACHE_CNT   NULL_CNT TIMESTAMP
---------- ---------- ---------- ---------- ---------- ---------- ---------
SAMPLE_SIZE    MINIMUM    MAXIMUM    DISTCNT
----------- ---------- ---------- ----------
LOWVAL
----------------------------------------------------------------
HIVAL                                                               DENSITY
---------------------------------------------------------------- ----------
   INTCOL#     SPARE1     SPARE2     AVGCLN     SPARE3     SPARE4
---------- ---------- ---------- ---------- ---------- ----------
     56142          1      45765          1          1          0 25-NOV-05
      45765          1          1          1
C102
C102                                                             .000010925
         1          1          2          3

Thank you.

- Vladimir 

Followup   November 26, 2005 - 12pm UTC:

autotrace (explain plan) do not BIND PEEK - so they can "lie"

To demnonstrate this and the effect of bind variable peeking, I'll run this script:

create table t
as
select 99 id, a.* from all_objects a;

update t set id = 1 where rownum = 1;

create index t_idx on t(id);
begin
   dbms_stats.gather_table_stats
   ( user, 'T',
     method_opt=>'for all indexed columns size 254',
     cascade=>TRUE );
end;
/
variable n number
alter session set sql_trace=true;
exec :n := 1
select count(object_type) from t n_is_1_first where id = :n;
exec :n := 99
select count(object_type) from t n_is_1_first where id = :n;
exec :n := 99
select count(object_type) from t n_is_99_first where id = :n;
exec :n := 1
select count(object_type) from t n_is_99_first where id = :n;

The tkprof was generated with:

tkprof tracefile outputfile aggregate=no explain=u/p

to get both an EXPLAIN PLAN as well as getting all four statements printed individually:

we start with :N of one, the optimizer will actually optimize this statement as if the literal
number one was in the query itself:

BEGIN :n := 1; END;
********************************************************************************
select count(object_type) from t n_is_1_first where id = :n

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

we did a hard parse (misses = 1), the number one was put in place of :n and the query optimizer
said "use index! it'll only get one row!".  In the following the row source operation plan is the
ACTUAL PLAN used, the EXECUTION PLAN is what explain plan - which DOES NOT BIND PEEK - would give
us.  NOTE THEY ARE DIFFERENT.  Explain plan doesn't have access to the binds:

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
      1   TABLE ACCESS BY INDEX ROWID T
      1    INDEX RANGE SCAN T_IDX (object id 36010)

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
      1    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 99; END;
********************************************************************************
select count(object_type) from t n_is_1_first where id = :n

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

This query was soft parsed, the plan already existed.  This too will therefore use an index range
scan - even though in this case, a full scan would have been "better".  Again, explain plan "lies"
to us here

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
  30689   TABLE ACCESS BY INDEX ROWID T
  30689    INDEX RANGE SCAN T_IDX (object id 36010)

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
  30689    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 99; END;
********************************************************************************
select count(object_type) from t n_is_99_first where id = :n

Note that this is a hard parse (changed the correlation name in the query, new query).  This time
- the optimizer plugs 99 into the query and optimizes.  We do full scan this time:

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
  30689   TABLE ACCESS FULL T

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
  30689    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 1; END;
********************************************************************************
select count(object_type) from t n_is_99_first where id = :n

but note that this is a soft parse, the full scan plan will be used - first one to run the query
with bind variable inputs (first hard parse of it) gets to to "pick the plan".... So, basically the
same query - just changed the correlation name and we have different plans:

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
      1   TABLE ACCESS FULL OBJ#(36009)

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
      1    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'

You would need to use a 10053 trace to see the "true" cardinality in this case
 

 Unexpected cardinality   November 26, 2005 - 1pm UTC

Bookmark | Bottom | Top

Reviewer: Jonathan Lewis from UK

Test case does not reproduce on my 9.2.0.6

Note that what you have looks like the standard 1% estimate for function(col) = constant.  As if
Oracle is treating this like a coercion problem.

select count(*) from t1 where f=:f;

  COUNT(*)
----------
     45765

Execution Plan
----------------------------------------------------------
0   SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
1 0   SORT (AGGREGATE)
2 1     INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=458 Bytes=1374)

    card = 458 = ceil(45765/100).

My plan gives card={actual number of rows in table).

What do you see as the filter_predicates if you run the bind variable version (with a to_number()
on the bind) through dbms_xplan ?

Followup   November 26, 2005 - 4pm UTC:

I didn't even look at the numbers to tell the truth, I saw "bind variable vs literal".....

but yes, I don't see it in 9206 - but in 9iR1:

ops$tkyte@ORA9IR1> create table t1 as select 1 f from all_objects;

Table created.

ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> create index i_t1 on t1(f);

Index created.

ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size
254',estimate_percent=>99.99, cascade=>true);

PL/SQL procedure successfully completed.

ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> variable f number
ops$tkyte@ORA9IR1> set autotrace traceonly explain
ops$tkyte@ORA9IR1> select * from t1 where f=1;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=31020 Bytes=93060)
   1    0   TABLE ACCESS (FULL) OF 'T1' (Cost=8 Card=31020 Bytes=93060)

ops$tkyte@ORA9IR1> select * from t1 where f=2;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=3)
   1    0   INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=1 Bytes=3)

ops$tkyte@ORA9IR1> select * from t1 where f=:f;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=310 Bytes=930)
   1    0   INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=310 Bytes=930)

ops$tkyte@ORA9IR1> set autotrace off

it does the 1%, in 9iR2 - 9206 - it did this:

ops$tkyte@ORA9IR2> select * from t1 where f=:f;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=30693 Bytes=92079)
   1    0   INDEX (FAST FULL SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=7 Card=30693 Bytes=92079)

 Odd cardinality   November 26, 2005 - 2pm UTC

Bookmark | Bottom | Top

Reviewer: Jonathan Lewis from UK

I forgot to mention - Wolfgang Breitling has mentioned to me an oddity that appears when you create
a histogram on a column with just one value. I think there's an article on his site
www.centrexcc.com which mentions it.
 

 question about the density calculation   November 28, 2005 - 4am UTC

Bookmark | Bottom | Top

Reviewer: William Moffatt from London, England

Hi Tom,

I'm just curious about something in the support note: it says about the density calculation :

"The number of distinct NON-null values for a column (COL1) on table TABLE1
can be obtained as follows:

select distinct count(COL1)
from   TABLE1
where  COL1 is not null;
"

Firstly, isn't the "not null" implied by the count?

Secondly, shouldn't that be "count(distinct COL1)", rather than "distinct count(COL1)"?

This is just idle curiosity, since the intent is obvious, and I presume the calculation works,
however it's done :)

(I did test it out, on 8.1.7.4, and it does appear that the query as written is incorrect, however
the note does say pre 7.3 - but then again, the "post 7.3" section refers you back to this query
too). I've seen you raise documentation bugs before, and similarly if I've made a mistake
somewhere, I'd rather know what and why :)

BTW, many thanks for the marvelous resource that this site is.

regards,

William
 

Followup   November 28, 2005 - 7am UTC:

it should just be 

select count(distinct col1) from table1;

the query presented in the note would not make sense.  

select distinct count(col1) from table1 where col1 is not null;

is the same as

select count(col1) from table1;

which would count non-null col1 values - they meant count(distinct col1) 

 Calculation of density   July 30, 2008 - 6am UTC

Bookmark | Bottom | Top

Reviewer: Mette from DK

Thanks to this entry I now understand how we get the estimated number of rows when using histograms
- thanks.

But I dont understand how Oracle calculates the density (used for the non popular values).

I'm on 10.2.0.3 on AIX

This is the situation:

250 buckets
23644 distinct values
34 popular (spanning more than one bucket)
1.799.977 rows
stats run with 100%

The density is 0.002267921.... (taken from TAB_COLUMNS)

How does he get that number?

I tried with the formula from above - and got nothing like this number.

215 / 23644 = 0.009093216

What have I missed here?

Best regards

时间: 2024-08-28 18:22:11

直方图 density(密度)解释的相关文章

ZBrush功能之Density按钮介绍

  Density 按钮组用于在UV展开后,设置某一部分在UV框中所占的大小比例.在一般情况下,对于头部而言,当然希望眼睛.鼻子和嘴巴等部分所占三维UV方格尽量大些,因为所占的空间越大,绘制贴图的时候越方便.越精确.而ZBrush是自动分配所占大小比重的,此时要通过调整密度值改变它们所占空间的大小. 要使用其中的按钮,要先激活"Enable Control Painting [控制绘画]"按钮,再激活"Density [密度]"按钮,这样其下面的数值和滑块调节按钮才

android计算pad或手机的分辨率/像素/密度/屏幕尺寸/DPI值的方法_Android

手机分辨率基础知识(DPI,DIP计算) 1.术语和概念 术语 说明 备注 Screen size(屏幕尺寸) 指的是手机实际的物理尺寸,比如常用的2.8英寸,3.2英寸,3.5英寸,3.7英寸 摩托罗拉milestone手机是3.7英寸 Aspect Ratio(宽高比率) 指的是实际的物理尺寸宽高比率,分为long和nolong Milestone是16:9,属于long Resolution(分辨率) 和电脑的分辨率概念一样,指手机屏幕纵.横方向像素个数 Milestone是854*480

跟我一起数据挖掘(9)——R语言

R是用于统计分析.绘图的语言和操作环境.R是属于GNU系统的一个自由.免费.源代码开放的软件,它是一个用于统计计算和统计制图的优秀工具. 一.基本介绍: 1. 警告:在输入命令前请切换到英文模式.否则你的一大段代码可能因为一个中文状态的括号而报错,R语言的报错并不智能无法指出错误的具体位置.最可怕的是不报错但就是无法输出正确结果. 2. 警告:R语言区别大小写,所以对象A与对象a并不一样,在调用函数.R包时也要注意大小写. 3. R软件由R包组成,每个包里有各种函数.命令"library()&q

Microsoft SQL Server 2000 中查询优化器使用的统计

Microsoft SQL Server 2000 中查询优化器使用的统计 Microsoft? SQL Server 2000 收集关于存储在数据库中的索引和列数据的统计信息.SQL Server 查询优化器使用这些统计信息来选择用于执行 INSERT.SELECT.DELETE 或 UPDATE 查询的最有效方案.本文说明了收集的数据类型.数据的存储位置以及用于创建.更新与删除统计的命令.默认情况下,SQL Server 2000 会自动创建和更新统计(当此类操作有用时).本文也概括介绍了如

《R语言游戏数据分析与挖掘》一3.3 高级绘图函数

3.3 高级绘图函数 大部分高级绘图函数均有add参数(plot函数没有).如果add=FALSE(默认),则在新窗口中创建一个图形:如果add=TRUE,则在当前活动窗口中的原有图形之上叠加图形.在基础包中,R提供了绘制常见图形的功能,包括散点图.气泡图.柱状图.饼图.线图,也提供了一些专业的统计图形,如茎叶图.Q-Q图等.表3-9列出了基础包中可以绘制的一些图形. 3.3.1 散点图 1.散点图 R中创建散点图的基础函数是plot(x,y),其中x和y是数值型向量,代表图形中的(x,y)点.

R语言数据挖掘相关包总结-转帖

与数据挖掘有关或者有帮助的R包和函数的集合. 1.聚类 常用的包: fpc,cluster,pvclust,mclust 基于划分的方法: kmeans, pam, pamk, clara 基于层次的方法: hclust, pvclust, agnes, diana 基于模型的方法: mclust 基于密度的方法: dbscan 基于画图的方法: plotcluster, plot.hclust 基于验证的方法: cluster.stats 2.分类 常用的包: rpart,party,rand

实例介绍Cocos2d-x物理引擎:HelloPhysicsWorld

我们通过一个实例介绍一下在Cocos2d-x 3.x中使用物理引擎的开发过程熟悉这些API的使用.这个实例的运行后的场景当场景启动后玩家可以触摸点击屏幕每次触摸时候就会在触摸点生成一个新的精灵精灵的运行是自由落体运动. HelloPhysicsWorld实例 使用物理引擎的一般步骤如下图所示. 这个过程与上帝创建世界的过程类似上帝首先创建了世界然后为世界指定了边界否则万物就会掉到世界之外的混沌里去了最后上帝创建了万物.当然这只是一个最基本的步骤有的时候还需要碰撞检测和使用关节等处理.下面我们就按

SQL Server 性能调优培训引言

原文:SQL Server 性能调优培训引言      大家好,这是我在博客园写的第一篇博文,之所以要开这个博客,是我对MS SQL技术学习的一个兴趣记录.       作为计算机专业毕业的人,自己对技术的掌握总是觉得很肤浅,博而不专,到现在我才发现自己的兴趣所在,于是我通过网络找了各种MS SQL技术的相关文档,总觉得讲得比较干涩,没有一个系统性,今年3月底我无意浏览到一个网站提供免费的性能调优的半年培训(http://www.sqlpassion.at/academy/performance

CBO中基数(cardinality)、可选择率(selectivity)的计算公式

CBO中基数(cardinality).可选择率(selectivity)的计算公式 Cardinality(基数)是Oracle预估的返回行数,即对目标SQL的某个具体执行步骤的执行结果所包含记录数的估算值.如果是针对整个目标SQL,那么此时的Cardinality就表示该SQL最终执行结果所包含记录数的估算值.例如,一张表T有1000行数据,列COL1上没有直方图,没有空值,并且不重复的值(Distinct Value)有500个.那么,在使用条件"WHERE COL1="去访问表