ORACLE动态采样分析

动态采样概念

 


态采样(Dynamic Sampling)是在ORACLE 9i Release
2中开始引入的一个技术,引入它的目的是为了应对数据库对象没有分析(统计信息缺失)的情况下,优化器生成更好的执行计划。简单的说,在数据库段(表、索
引、分区)对象没有分析的情况下,为了使CBO优化器得到足够多的信息以保证优化器做出正确执行计划而发明的一种技术。它会分析一定数量段对象上的数据块
获取CBO需要的统计信息。动态采样技术仅仅是统计信息的一种补充,它不能完全替代统计信息分析。

Dynamic sampling first became available in Oracle9i Database Release 2. It is the ability of the cost-based optimizer (CBO) to sample the tables a query references during a hard parse, to determine better default statistics for unanalyzed segments, and to verify its “guesses.” This sampling takes place only at hard parse time and is used to dynamically generate better statistics for the optimizer to use, hence the name dynamic sampling.
 
The purpose of dynamic sampling is to improve server performance by determining more accurate estimates for predicate selectivity and statistics for tables and indexes. The statistics for tables and indexes include table block counts, applicable index block counts, table cardinalities, and relevant join column statistics. These more accurate estimates allow the optimizer to produce better performing plans.
 

动态采样在Oracle 11g之前称为 Dynamic Sampling, ORACLE 12c之后改名为Dynamic Statistic.

 

动态采样介绍

如果要理解动态采样,最好从鲜活的例子开始,向来理论都是枯燥乏味的。创建一个test表,总共有50319行数据。如下所示

SQL> create table test
  2  as 
  3   select owner, object_type
  4    from dba_objects;
 
Table created.
 
SQL> select count(1) from test;
 
  COUNT(1)
----------
     50319

我们使用dynamic_sampling(test
0)提示(hints)来禁用动态采样(稍后动态采样级别中介绍),从下面的执行计划可以看出,在表对象没有做分析情况下,如果禁用了动态采样,CBO优
化器唯一可以使用的信息为该表存储在数据字典的一些信息,比如多少个extent,多少个block等,这些信息往往不够。此时优化器估计表test的行
数为11027(如下所示), 跟实际的表记录行数50319还是有蛮大的偏差。在复杂环境下,就很有可能导致CBO优化器做出错误的执行计划。

SQL> set autotrace traceonly explain;
SQL> select /*+ dynamic_sampling(test 0) */ * from test;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 11027 |   301K|    31   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| TEST | 11027 |   301K|    31   (0)| 00:00:01 |
--------------------------------------------------------------------------
SQL> set autotrace off;

如果启用动态采样(默认情况下,动态采样级别为2),优化器根据动态采
样得到一些数据信息猜测、估计表TEST的记录行数为48054,已经接近实际记录行数50319了。比不做动态采样分析要好很多了。当然你不能指望动态
采样获取完全准确的信息,因为它只是采样了一些数据块。

SQL> set autotrace traceonly explain;
SQL> select * from test;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 48054 |  1313K|    32   (4)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| TEST | 48054 |  1313K|    32   (4)| 00:00:01 |
--------------------------------------------------------------------------
Note
-----
   - dynamic sampling used for this statement
 
SQL> set autotrace off;

如果我们将动态采样的级别提高为3,如下所示,发现优化器根据动态采样得到的信息比默认(默认情况下,动态采样级别为2)情况获得的信息更准确。优化器估计表TEST的行数为51463,比48054又接近实际情况一步了。

 
SQL> set autotrace traceonly explain;
SQL> select /*+ dynamic_sampling(test 3) */ * from test;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 51463 |   703K|    32   (4)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| TEST | 51463 |   703K|    32   (4)| 00:00:01 |
--------------------------------------------------------------------------
SQL> set autotrace off;

在Tom的这篇文章中提到,在没有动态采样的情况下,如果删除了该表数
据,CBO优化器估算的结果集和没有删除之前是一样的。这是因为当一个表的数据被删除后,这个表所分配的extent和block是不会自动回收的(高水
位线不变),所以CBO如果没有采样数据块做分析,只是从数据字典中获取extend等信息,就会误认为任然还有那么多数据。

SQL> delete from test;
 
50319 rows deleted.
 
SQL> commit;
 
Commit complete.
 
SQL> set autotrace traceonly explain;
SQL> select /*+ dynamic_sampling(test 0) */ * from test;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 11027 |   301K|    31   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| TEST | 11027 |   301K|    31   (0)| 00:00:01 |
--------------------------------------------------------------------------
SQL> select * from test;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    28 |    31   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| TEST |     1 |    28 |    31   (0)| 00:00:01 |
--------------------------------------------------------------------------
Note
-----
   - dynamic sampling used for this statement
 
SQL> 

 

什么时候使用动态采样?

如下所示,我们使用包
dbms_stats.gather_table_stats收集表Test的统计信息过后,你会发现“dynamic sampling used
for this
statement”不见了,其实也就是说优化器发现有表TEST有分析过,它就不会使用动态采样技术。其实开篇的时候已经叙说过“应对数据库对象没有分
析(统计信息缺失)的情况下,才会用到动态采样技术“

SQL> set autotrace trace exp;
SQL> select * from test;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    28 |    31   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| TEST |     1 |    28 |    31   (0)| 00:00:01 |
--------------------------------------------------------------------------
Note
-----
   - dynamic sampling used for this statement
 
SQL> exec dbms_stats.gather_table_stats(user, 'test');
 
PL/SQL procedure successfully completed.
 
SQL> select * from test;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    28 |    31   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| TEST |     1 |    28 |    31   (0)| 00:00:01 |
--------------------------------------------------------------------------

第二种情况:当表TEST即使被分析过,如果查询脚本里面包含临时表,就会使用动态采样技术。因为临时表是不会被分析,它是没有统计信息的。如下所示

SQL> drop table test;
SQL> create table test
  2  as 
  3    select owner, object_type
  4   from dba_objects;
 
Table created.
 
SQL> exec  dbms_stats.gather_table_stats(user, 'test');
 
PL/SQL procedure successfully completed.
 
 
SQL> create global temporary table tmp
  2  (object_type varchar2(19));
 
Table created.
 
SQL> insert into tmp
  2  select distinct object_type from dba_objects;
 
41 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL>  set autotrace traceonly explain;
 
SQL>  select t.owner, l.object_type
  2  from test t inner join tmp l on t.object_type =l.object_type;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 19574435
 
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    25 |    35   (6)| 00:00:01 |
|*  1 |  HASH JOIN         |      |     1 |    25 |    35   (6)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| TMP  |     1 |    11 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| TEST | 49422 |   675K|    32   (4)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T"."OBJECT_TYPE"="L"."OBJECT_TYPE")
 
Note
-----
   - dynamic sampling used for this statement
 
SQL> 

动态采样还有一个独特能力,可以对不同列之间的相关性做统计。表统计信息都是相对独立的。当查询涉及列之间的相关性时,统计信息就显得有些不足了,请看Tom的例子

创建一个特殊的表t,然后对字段flag1、flag2创建索引t_idx,然后分析收集统计信息

SQL> create table t
  2     as select decode(mod(rownum,2),0,'N', 'Y') flag1,
  3               decode(mod(rownum,2),0,'Y', 'N') flag2, a.*
  4  from all_objects a;
 
Table created.
 
SQL> create index t_idx on t(flag1, flag2);
 
Index created.
 
SQL> begin
  2   dbms_stats.gather_table_stats(user, 'T',      
  3        method_opt =>'for all indexed columns size 254');
  4  end;
  5  /
 
PL/SQL procedure successfully completed.

关于表t的行数情况如下所示,大家先不要纠结为什么查询获取NUM_ROWS数据

SQL> select num_rows, num_rows/2, num_rows/2/2
  2  from user_tables
  3  where table_name='T';
 
  NUM_ROWS NUM_ROWS/2 NUM_ROWS/2/2
---------- ---------- ------------
     49875    24937.5     12468.75

首先看看对flag1过滤条件的SQL语句,CBO优化器猜测、估计的行数24757, 相当接近24937.5记录数了。

SQL> set autotrace traceonly explain;
SQL> select * from t where flag1='N';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 24757 |  2345K|   161   (2)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T    | 24757 |  2345K|   161   (2)| 00:00:02 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLAG1"='N')

首先看看对flag2过滤条件的SQL语句,CBO优化器猜测、估计的行数25118, 相当接近24937.5记录数了。

SQL> select * from t where flag2='N';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 25118 |  2379K|   161   (2)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T    | 25118 |  2379K|   161   (2)| 00:00:02 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLAG2"='N')

如果条件flag1 = 'N' and flag2 = 'N',我们根据逻辑推理判断这样的记录肯定是不存在的,这也是苦心构造这个特例的初衷。下面看看CBO优化器怎么探测、预测的

SQL> select * from t where flag1 = 'N' and flag2 = 'N';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 12468 |  1181K|   160   (2)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T    | 12468 |  1181K|   160   (2)| 00:00:02 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLAG1"='N' AND "FLAG2"='N')

CBO估计的记录数为12468,和实际情况相差非常远。其实是CBO优化器这样估算来的:

flag1=‘N' 的记录数占总数的1/2

flag2= 'N' 的记录数占总数的1/2

根据NUM_ROWS/2/2 =12468.这样显然是不合理的。下面我们通过提升动态采样级别,来看看动态采样是否能避免CBO的错误

SQL> select /*+ dynamic_sampling(t 3) */ * from t where flag1 = 'N' and flag2 = 'N';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 470836197
 
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     4 |   388 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     4 |   388 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX |     4 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FLAG1"='N' AND "FLAG2"='N')
 
Note
-----
   - dynamic sampling used for this statement
 
SQL> 

 

动态采样级别

ORACLE为动态采样划分了11个级别,详情请见ORACLE 11g官方文档http://download.oracle.com/docs/cd/E11882_01/server.112/e10821/stats.htm#PFGRF94760

Table 13-10 Dynamic Statistics Levels


Level


When the Optimizer Uses Dynamic Statistics


Sample Size (Blocks)


0


Do not use dynamic statistics


不做动态采样分析


n/a


1


Use dynamic statistics for all tables that do not have statistics, but only if the following criteria are met:

· There is at least 1 nonpartitioned table in the query that does not have statistics.

· This table has no indexes.

· This table has more blocks than the number of blocks that would be used for dynamic statistics of this table.


Oracle 对没有分析的表进行动态采样,但需要同时满足以下3个条件。

(1) SQL中至少有一个未分析的表(非分区表)

(2) 未分析的表没有索引

(3) 未分析的表占用的数据块要大于动态采样的数据块(32个)


32


2


Use dynamic statistics if at least one table in the statement has no statistics. This is the default setting.


对所有的未分析表做分析,动态采样的数据块是默认数据块数为64


64


3


Use dynamic statistics if any of the following conditions is true:

·

· The statement meets level 2 criteria.

·

· The statement has one or more expressions used in the WHERE clause predicates, for example, WHERE SUBSTR(cust_last_name,1,3).


采样的表包含满足Level 2定义的所有表,同时包括,那些谓词有可能潜在地需要动态采样的表,这些动态采样的数据块为默认数据块,对没有分析的表,动态采样的默认块为默认数据块数量。


64


4


Use dynamic statistics if any of the following conditions is true:

·

· The statement meets level 3 criteria.

·

· The statement uses complex predicates (an OR or AND operator between multiple predicates on the same table).


采样的表包含满足Level 3定义的表,同时还包括一些表,他们包含一个单表的谓词会引用另外的2个列或者更多的列;采样的块数是动态采样默认数据块数;对没有分析的表,动态采样的数据块为默认数据块的1倍。


64


5


Use dynamic statistics if the statement meets level 4 criteria.


采样的表包含满足Level 4定义的表,同时分别使用动态采样默认数据块的2倍的数量来做动态分析。


128


6


Use dynamic statistics if the statement meets level 4 criteria.


采样的表包含满足Level 4定义的表,同时分别使用动态采样默认数据块的4倍的数量来做动态分析。


256


7


Use dynamic statistics if the statement meets level 4 criteria.


采样的表包含满足Level 4定义的表,同时分别使用动态采样默认数据块的8倍的数量来做动态分析。


512


8


Use dynamic statistics if the statement meets level 4 criteria.


采样的表包含满足Level 4定义的表,同时分别使用动态采样默认数据块的32 倍的数量来做动态分析。


1024


9


Use dynamic statistics if the statement meets level 4 criteria.


采样的表包含满足Level 4定义的表,同时分别使用动态采样默认数据块的128倍的数量来做动态分析。


4086


10


Use dynamic statistics if the statement meets level 4 criteria.


采样的表包含满足Level 4定义的表,同时分别使用动态采样对所有数据块做动态分析。


All blocks


11


Use dynamic statistics automatically whenever the optimizer deems it necessary.


当优化器探测到需要的采样时,对段段对象自动采样


Automatically determined

采样级别越高,采样的数据块越多,得到的分析数据就越接近于真实,但同时伴随着
资源消耗的开销也增加了。这时一个需要权衡考虑的东西。ORACLE 10 g &
11g的默认采样级别都为2,如下所示,一般使用在会话中使用dynamic_sampling提示来修改动态采样级别。

SQL> show parameter optimizer_dynamic_sampling
 
NAME                               TYPE        VALUE
------------------------------ ----------- -----------
optimizer_dynamic_sampling         integer     2
SQL> 

另外一个方式就是通过提示hints里修改动态采样的级别。这个非常灵活、有用。

 

动态采样注意事项

凡事有利必有弊,动态采样也不是神器。它采样的数据块越
多,系统开销就越大,这样会增加SQL硬解析的时间,如果是数据库仓库(DW、OLAP)环境,SQL执行时间相当长,硬解析时间只占整个SQL执行时间
的一小部分,那么可以适当的提高动态采样级别,这样是有利于优化器获取更加正确的信息。一般设置为3或4比较合适。

但是在并发比较严重的OLTP系统中,每秒中有成千上万的SQL语句执行,它要求SQL语句短小、执行时间短,所以在OLTP系统中应该减低动态采样级别或不用动态采样。可以参考下面Tom的原文

 

When should I use
dynamic sampling?” is a tricky question. As with any other feature,
there are times to use it and times to avoid it. So far I’ve
concentrated on the “goodness” of dynamic sampling, and based on that,
it seems that you should set the level to 3 or 4 and just let the
optimizer always use dynamic sampling to validate its guesses.

That makes sense in an
environment in which you spend most of your time executing SQL and very
little of your overall time hard-parsing the SQL. That is, the SQL you
are executing runs for a long time and the parse time is a small portion
of the overall execution time, such as in a data warehousing
environment. There, dynamic sampling at levels above the default makes
complete sense. You are willing to give the optimizer a little more time
during a hard parse (when sampling takes place) to arrive at the
optimal plan for these complex queries.

That leaves the other classic
type of environment: the online transaction processing (OLTP) system.
Here, in general, you are executing queries thousands of times per
second and spend very little time executing a given query—the queries
are typically small and fast. Increasing the parse time in an OLTP
system might well cause you to spend more time parsing than executing
SQL. You do not want to increase the parse times here, so higher levels
of dynamic sampling would not be advisable

时间: 2024-10-27 15:29:58

ORACLE动态采样分析的相关文章

Oracle动态交叉表生成_oracle

正在看的ORACLE教程是:Oracle动态交叉表生成.Oracle是应用最广的大型数据库,而在范式下进行Oracle数据库设计则可以大大减少数据冗余,使数据库维护更方便,可惜范式下的数据表一般不能直接输出.今天我们就来探讨一下范式下的数据表的动态交叉表生成的方法. 范式下的Oracle数据库设计 数据关系的复杂性导致了表中数据冗余的存在,数据冗余增加了维护数据库的负担,也占用了大量的磁盘空间,直接造成性能下降.为了消除这些负面影响,就应该对数据库表格进行规范化,使其遵守一定的规则的,尤其是数据

通过Oracle动态性能视图采集查询调优数

对于现在的一些发行版本,DBA(Database Administrator,数据库管理员)和开发员可以访问的已经有三种动态性能视图了,分别为V$SQL.V$SQLAREA,还有V$SQLTEXT. 这些视图可以用来采集有关SQL命令执行的统计信息.在Oracle 10g, Release 2中,还增加了第四个动态性能视图,V$SQLSTATS,通过它能更方便地访问这类数据. 和静态数据字典视图(static dictionary view,也就是前缀为USER_.ALL_,或者DBA_的视图)

ORACLE 动态注册和静态注册的区别(转)

1, oracle 10g 用netca方式建立的都默认为动态注册方式2,如果想改为静态注册的方式则在listener.ora 中加入如下内容即可 SID_LIST_LISTENER = (SID_LIST = (SID_DESC = (SID_NAME = PLSExtProc) (ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1) (PROGRAM = extproc) ) (SID_DESC = (SID_NAME = JEFF ) (OR

由于存储过程及触发器引用ORACLE动态视图v$session编译不通过而执行时报错ORA-04098

    今天遇见一个很神奇的问题,ERP合同系统需要记录登录oracle数据库的用户名.IP地址.登录时间.登录客户端类型记录信息,需要创建个触发器, 在触发器中需要查询v$session视图,如下所示:  create or replace trigger EPMFRAMEWORK.PRF_OP_LOG    before delete or update on EPMFRAMEWORK.PRF_PROJECT_INFO_T    for each row      begin       i

oracle动态sql问题求解,希望大牛指点指点

问题描述 <iftest="@com.chinaGPS.gisap.util.Ognl@isNotBlank(task.vehicle_state)">andtv.flag=#{task.vehicle_state}</if>task.vehicle_state为1时,tv.flag为1,task.vehicle_state为0时,tv.flag为0或者为空,请问这样的动态sql怎么写? 解决方案 解决方案二:andnvl(tv.flag,0)=#{task.v

关于oracle动态视图v$datafile和v$datafile_header(转)

v$datafile是从oracle的控制文件中获得的数据文件的信息v$datafile_header是从数据文件的头部在正常运行下,两者的检查点SCN值是一致的,但当datafile出现损坏时可以用v$datafile_header确定应用重做的起始SCN,而v$datafile可确定应用重做的结束SCN值sql> select a.name,a.checkpoint_change# "start_SCN",    > b.checkpoint_change# "

oracle 动态AdvStringGrid完美示例 (AdvStringGrid使用技巧/Cells)_oracle

原理就是先声明常量,包括列数,行数,各列的属性,然后在程序的其它过程用这些常量来控制Cells.非常方便,便于修改和移植! 以下为窗体整体代码,中间有说明.此段代码不光有动态AdvStringGrid的完美示例,还有一般窗体的常用的过程,比较窗体初始化,刷新过程. 此窗体,只需要简单准备如下,即可运行: 1,添加一个TAdvStringGrid,并命名为strGrid1. 2,设置:TAdvStringGrid-->option-->goEditing=true TAdvStringGrid-

Oracle 动态查询EXECUTE IMMEDIATE select into使用方法

  create or replace procedure TEST_TABLE is l_table_name varchar2(50); query_stat varchar2(100); l_zyq t_djjcxx.zyq%type; l_jh varchar2(20); l_djjcxx t_djjcxx%rowtype; begin l_jh :='7874'; select name into l_table_name from TESTA; DBMS_OUTPUT.put_lin

Oracle动态执行SQL四种方式的例子

方式1 CREATE OR REPLACE PROCEDURE demo(salary IN NUMBER) AS     cursor_name INTEGER;     rows_processed INTEGER; BEGIN     cursor_name := dbms_sql.open_cursor;     dbms_sql.parse(cursor_name, 'DELETE FROM emp WHERE sal > :x',                    dbms_sq