[20161208]11g直方图与char数据类型.txt

[20161208]11g直方图与char数据类型.txt

--以前看tom大师的书提到过不要使用char数据类型,哪怕是char(1)也不要使用,最近看了几篇blob里面都提到了11g升级后会出现char数
--据类型直方图统计发生了变化,我重复别人的例子来说明.再次强调不要生产环境使用char类型.

--参考链接:http://blog.dbi-services.com/histograms-on-character-strings-between-11-2-0-3-and-11-2-0-4/

1.环境:
SCOTT@book> @ &r/ver1
PORT_STRING         VERSION    BANNER
------------------- ---------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

SCOTT@book> create table DEMO ( flag char);
Table created.

SCOTT@book> @ &r/desc demo
Name  Null?    Type
----- -------- --------
FLAG           CHAR(1)
--不指定长度,长度为1.

SCOTT@book> insert into DEMO select 'Y' from xmltable('1 to 100000');
100000 rows created.

SCOTT@book> insert into DEMO select 'N' from xmltable('1 to   1000');
1000 rows created.

SCOTT@book> commit ;
Commit complete.

SCOTT@book> select flag,to_char(ascii(flag),'XX'),count(*) from DEMO group by flag;
F TO_   COUNT(*)
- --- ----------
Y  59     100000
N  4E       1000

SCOTT@book> exec dbms_stats.gather_table_stats(user,'DEMO',no_invalidate=>false,Estimate_Percent => NULL,Method_Opt => 'FOR ALL COLUMNS SIZE 5');
PL/SQL procedure successfully completed.

select endpoint_number,endpoint_value
,to_char(endpoint_value,rpad('FM',65,'X'))
,utl_raw.cast_to_varchar2(hextoraw(to_char(endpoint_value,rpad('FM',65,'X')))) c20
from user_histograms h where table_name='DEMO';

ENDPOINT_NUMBER ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,RPAD('FM',65,'X'))                        C20
--------------- -------------- ---------------------------------------------------------------- --------------------
         101000     4.6211E+35 590000000000127D2B51B120000000                                   Y     }+Q?
           1000     4.0500E+35 4E0000000000181F436C7BBB200000                                   N     Cl{?

--你可以注意实际参加运算的Y/N后面的字符是0x00.}
--而如果你设置参数optimizer_features_enable.

SCOTT@book> alter session set optimizer_features_enable='11.2.0.3';
Session altered.

SCOTT@book> exec dbms_stats.gather_table_stats(user,'DEMO',no_invalidate=>false,Estimate_Percent => NULL,Method_Opt => 'FOR ALL COLUMNS SIZE 5');
PL/SQL procedure successfully completed.

select endpoint_number,endpoint_value
,to_char(endpoint_value,rpad('FM',65,'X'))
,utl_raw.cast_to_varchar2(hextoraw(to_char(endpoint_value,rpad('FM',65,'X')))) c20
  4  from user_histograms h where table_name='DEMO';
ENDPOINT_NUMBER ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,RPAD('FM',65,'X'))                        C20
--------------- -------------- ---------------------------------------------------------------- --------------------
           1000     4.0565E+35 4E20202020203A7BB119D5F6000000                                   N     :{闭?
         101000     4.6277E+35 59202020202034D998FF0B5AE00000                                   Y     4贅? Z?

--注意实际参加运算的Y/N后面的字符是0x20.} 也就是使用空格.

2.测试:
--这个时候退出,在登录执行如下:
SCOTT@book> alter session set statistics_level=all;
Session altered.

SCOTT@book> show parameter optimizer_features_enable
NAME                      TYPE   VALUE
------------------------- ------ ---------
optimizer_features_enable string 11.2.0.4

--恢复原来设置.

SCOTT@book> select count(*) from demo where flag='Y';
  COUNT(*)
----------
    100000

SCOTT@book> @ &r/dpc '' ''
PLAN_TABLE_OUTPUT
--------------------------------------
SQL_ID  8var563j3tg8g, child number 0
-------------------------------------
select count(*) from demo where flag='Y'
Plan hash value: 2180342005
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |    69 (100)|          |      1 |00:00:00.03 |     219 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |     2 |            |          |      1 |00:00:00.03 |     219 |
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |    500 |  1000 |    69   (2)| 00:00:01 |    100K|00:00:00.02 |     219 |
---------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$1 / DEMO@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("FLAG"='Y')

--你可以发现这时E-Rows=500.

SCOTT@book> select * from DBA_TAB_HISTOGRAMS where owner=user and table_name='DEMO';
OWNER  TABLE_NAME COLUMN_NAME          ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_A
------ ---------- -------------------- --------------- -------------- ----------
SCOTT  DEMO       FLAG                            1000     4.0565E+35
SCOTT  DEMO       FLAG                          101000     4.6277E+35

select * from DBA_TAB_cols where owner=user and table_name='DEMO';

Record View
As of: 2016/12/8 9:24:56

OWNER:                 SCOTT
TABLE_NAME:            DEMO
COLUMN_NAME:           FLAG
DATA_TYPE:             CHAR
DATA_TYPE_MOD:        
DATA_TYPE_OWNER:      
DATA_LENGTH:           1
DATA_PRECISION:       
DATA_SCALE:           
NULLABLE:              Y
COLUMN_ID:             1
DEFAULT_LENGTH:       
DATA_DEFAULT:         
NUM_DISTINCT:          2
LOW_VALUE:             4E
HIGH_VALUE:            59
DENSITY:               0.00000495049504950495
NUM_NULLS:             0
NUM_BUCKETS:           2
LAST_ANALYZED:         2016/12/8 9:19:24
SAMPLE_SIZE:           101000
CHARACTER_SET_NAME:    CHAR_CS
CHAR_COL_DECL_LENGTH:  1
GLOBAL_STATS:          YES
USER_STATS:            NO
AVG_COL_LEN:           2
CHAR_LENGTH:           1
CHAR_USED:             B
V80_FMT_IMAGE:         NO
DATA_UPGRADED:         YES
HIDDEN_COLUMN:         NO
VIRTUAL_COLUMN:        NO
SEGMENT_COLUMN_ID:     1
INTERNAL_COLUMN_ID:    1
HISTOGRAM:             FREQUENCY
QUALIFIED_COL_NAME:    FLAG

-- 实际上0x5920 与 0x5900 存在很大差异,运算并没有按照直方图来计算,而使用DENSITY来计算.0.00000495049504950495 * 1000* 101000 =499.99999999999995.
-- 作者做10053测试
Column (#1):
    NewDensity:0.004950, OldDensity:0.000005 BktCnt:101000.000000, PopBktCnt:101000.000000, PopValCnt:2, NDV:2
--这个新旧Density,为什么相差1000,我不知道?

--如果你重新分析就ok了.

SCOTT@book> exec dbms_stats.gather_table_stats(user,'DEMO',no_invalidate=>false,Estimate_Percent => NULL,Method_Opt => 'FOR ALL COLUMNS SIZE 5');
PL/SQL procedure successfully completed.

SCOTT@book> select count(*) from demo where flag='Y';
  COUNT(*)
----------
    100000

SCOTT@book> @ &r/dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  8var563j3tg8g, child number 0
-------------------------------------
select count(*) from demo where flag='Y'
Plan hash value: 2180342005
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |    69 (100)|          |      1 |00:00:00.03 |     219 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |     2 |            |          |      1 |00:00:00.03 |     219 |
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |    100K|   195K|    69   (2)| 00:00:01 |    100K|00:00:00.02 |     219 |
---------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$1 / DEMO@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("FLAG"='Y')

--这样E-Rows正确了.所以作者建议升级后要重新分析.

3.另外我上面的测试使用数据类型char(1),改变长度呢?重复测试:
--drop table demo purge ;
create table DEMO ( flag char(20));
insert into DEMO select 'Y' from xmltable('1 to 100000');
insert into DEMO select 'N' from xmltable('1 to   1000');
commit ;

--实际上这个保存在数据库的信息是
SCOTT@book> select dump(flag,16) from demo where rownum=1;
DUMP(FLAG,16)
------------------------------------------------------------------------------
Typ=96 Len=20: 59,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20
--有19个空格.

SCOTT@book> select flag,to_char(ascii(flag),'XX'),count(*) from DEMO group by flag;
F TO_   COUNT(*)
- --- ----------
Y  59     100000
N  4E       1000

SCOTT@book> exec dbms_stats.gather_table_stats(user,'DEMO',no_invalidate=>false,Estimate_Percent => NULL,Method_Opt => 'FOR ALL COLUMNS SIZE 5');
PL/SQL procedure successfully completed.

select endpoint_number,endpoint_value
,to_char(endpoint_value,rpad('FM',65,'X'))
,utl_raw.cast_to_varchar2(hextoraw(to_char(endpoint_value,rpad('FM',65,'X')))) c40
from user_histograms h where table_name='DEMO';

ENDPOINT_NUMBER ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,RPAD('FM',65,'X'))   C40
--------------- -------------- ------------------------------------------- ----------------
           1000     4.0500E+35 4E0000000000181F436C7BBB200000              N     Cl{?
         101000     4.6211E+35 590000000000127D2B51B120000000              Y     }+Q?

--你可以发现oracle 11.2.0.4 改变了算法,取消了后面的空格,而使用0x00 参加运算.
--从我个人的建议不要在生产系统使用任何char类型,哪怕是char(1),而统一使用varchar2类型,即使是varchar2(1).

--jonathanlewis大师也写了一篇blog,链接https://jonathanlewis.wordpress.com/2016/12/02/histogram-upgrade/

--另外在链接里面还有一个链接指向直方图如何运算,链接:https://jonathanlewis.wordpress.com/2010/10/05/frequency-histogram-4/
--我摘要主要部分:
In an earlier note on interpreting the content of frequency histograms I made a throwaway comment about the extra
complexity of interpreting frequency histograms on character-based columns. This note starts to examine some of the
complications.

The driving problem behind character columns is that they can get quite large – up to 4,000 bytes – so the content of
an "accurate histogram" could become quite large, and Oracle seems to have taken a strategic decision (at some point in
history) to minimise this storage. As a result we can see an algorithm that works roughly as follows:

    Take the first 15 bytes of the string (after padding the string with nulls (for varchar2) or spaces (for char))
    Treat the result as a hexadecimal number, and convert to decimal
    Round to 15 significant digits and store as the endpoint_value
    If duplicate rows appear, store the first 32 characters (increased to 64 for 12c) of each string as the endpoint_actual_value

Given this algorithm, we can do an approximate reversal (which will only be needed when the endpoint_actual_value is not
available) by formatting the endpoint_value into a hex string, extracting the first six pairs of digits, converting to
numeric and applying the chr() function to get a character value. (You'll have to fiddle with this bit of code to handle
multibyte character sets, of course).

With a nice friendly single-byte character code, the first 5 characters will be extracted correctly (assuming there are
at least 5 characters in the string), and the sixth will be pretty close to the original. Here's an example (which also
includes the logic to convert the endpoint_number into a frequency):

--哎,许多还是不懂,放弃..

时间: 2024-08-19 00:59:58

[20161208]11g直方图与char数据类型.txt的相关文章

[20120803]11G SPM的学习1.txt

[20120803]11G SPM的学习1.txt     开始学习SQL Plan Management(SPM) ,11G开始提供SPM,在10g下我经常使用sql profile看一些bad sql语句,sql profile我觉得已经做的很好,有时候能够提供很好的建议.我开始学习SPM的时候感觉不习惯(也许是因为在toad下使用sql profile很简单)为什么oracle还有搞出SPM来,慢慢看资料,才明白其中一些细节.     我看过别人在从8i升级到9i的时候,出现性能波动,里面

[20131220]频率直方图的简单探究.txt

[20131220]频率直方图的简单探究.txt http://allthingsoracle.com/histograms-part-1-why/http://www.itpub.net/thread-1816475-1-1.html 昨天本想看12c的混合直方图的相关信息,无意之中发现以上链接,Jonathan Lewis给出很好的例子,newkid的翻译写的很清晰,自己再按照上面的介绍写一些例子做一些测试. 我的测试环境11.2.0.3,建立测试环境: SYS@test> @verBANN

[20170620]11G 12c expand sql text.txt

[20170620]11G 12c expand sql text.txt --//原来写的脚本只能分别在11g,12c单独使用.上午花一点点时间.把两者整合起来. --//讨论链接:http://www.itpub.net/thread-2088981-1-1.html --//再次感谢solomon_007的指点: set long 20000 set serveroutput on declare     L_sqltext clob := null;     l_version varc

[20171120]11G关闭直接路径读.txt

[20171120]11G关闭直接路径读.txt --//今天做filesystemio_options参数测试时,遇到一个关于直接路径读的问题. --//如果看以前的博客介绍,设置"_serial_direct_read"=never或者events '10949 trace name context forever';就可以关闭直接路径读. --//我的测试在11GR2下set events '10949 trace name context forever';不行. --//通过

[20170518]11G审计日志清除3.txt

[20170518]11G审计日志清除3.txt --//以前写的链接:http://blog.itpub.net/267265/viewspace-2133145/ --//我当时写存在许多问题,仅仅清除sys.aud$内容,参数audit_file_dest目录的文件不清除,而这里参数大量的aud文件,重新测试看看. 1.环境: SYS@book> @ &r/ver1 PORT_STRING                    VERSION        BANNER -------

[20150504]11G口令不对的问题.txt

[20150504]11G口令不对的问题.txt --前一阵子,遇到用户口令不对,导致登录缓慢的问题,主要11G加入密码错误验证延迟导致的.(哎现在看文档才知道) http://blog.itpub.net/267265/viewspace-1479718/ Preventing passwords from being broken. If a user tries to log in to Oracle Database multiple times using an incorrect p

[20160516]11G HugePage的使用问题.txt

[20160516]11G HugePage的使用问题.txt --我们的dg内存不足,配置的/etc/sysctl.conf如下: vm.nr_hugepages = 3200 --没有启动数据库前如下: # grep -i huge /proc/meminfo AnonHugePages:     14336 kB HugePages_Total:    3200 HugePages_Free:     3200 HugePages_Rsvd:        0 HugePages_Surp

[20150706]11G谓词推入问题.txt

[20150706]11G谓词推入问题.txt --生产系统遇到一个sql语句的问题. --生产系统的sql语句比较复杂,做一个简化的例子来说明问题.来说明自己优化遇到的困惑. 1.建立测试环境: SCOTT@test> @ver1 PORT_STRING                    VERSION        BANNER ------------------------------ -------------- ----------------------------------

[20160816]11G dataguard坏块修复.txt

[20160816]11G dataguard坏块修复.txt --11GR2 不仅仅支持在备库在只读的情况下,日志应用(ACTIVE Data Guard),还提供主备库的坏块修复.自己以前也做过相关测试, --我记得上次测试的仅仅是主库数据块损坏,没有测试备库的数据块损坏.补充一些测试: 1.环境: SYS@test> @ ver1 PORT_STRING                    VERSION        BANNER ---------------------------