[20150204]关于位图索引3.txt

--许多人知道在oltp系统不适合使用位图索引.它的索引的记录结构如下是:
字段0:键值
字段1:开始rowid
字段2:结束rowid
字段3:位图信息,指示那行记录,位图1=>表示存在.位图0=>表示不存在.

--但是字段4的位图信息,介绍的资料太少,我自己看了链接http://juliandyke.com/Presentations/BitmapIndexInternals.ppt,做一个简单探究.
--昨天讲了Single-Byte Groups,再总结如下:

1.Byte represents the number of zero bits followed by a one bit
2.Maximum of 191 zero bits
3.Range of byte values is 0x00 to 0xBF

--如果仔细看可以发现Single-Byte Groups,仅仅出现1个1,如果8位里面包含两个1,这种方式就无法表示.而且还受到前导0的限制最多191个0.
--这样就出现Multi-Byte Groups.还是参考http://juliandyke.com/Presentations/BitmapIndexInternals.ppt文档:

Multi-Byte Groups
. Multi-byte groups allow more than 192 bits to be skipped
. First byte is a control byte 11
. First two bits indicate this is a control byte (always 11)

. Next three bits indicate number of zero bytes to skip

. If all three bits are set then number overflows to second byte
. If top bit of second byte is set then number of zero bytes overflows to third byte
. Last three bits indicate number of bytes following control block (minimum 1, maximum 8)

--上面的表述不是很好理解.至少我看了很久都无法理解,我自己总结如下(现在才觉得自己的语文没学好,但愿我能描述清楚):
--第1字节大于等于192,使用Multi-Byte Groups.但愿我能讲清楚里面的一些细节,也可能讲的不对.希望知道能指正谢谢!!!

1.首先前面2位一定是11,只有这样才能大于192.也叫控制位.
2.下面3位表示0字节的数量.
         字节数量      0bit的数量
=================================
001         0               0
010         1               8
011         2              16
100         3              24
101         4              32
110         5              40
==================================

其中3位=111,必须扩展到下一字节.

                          字节数量      0bit的数量
==================================================
111000 00000000                  6            48
111000 00000001                  7            56
111000 00000010                  8            64
....
111000 01111111                133      133*8=1064
111000 10000000 00000001       134      134*8=1072
111000 10000001 00000001       135      135*8=1080
==================================================

. Last three bits indicate number of bytes following control block (minimum 1, maximum 8)
--最后3位是表示位图信息的长度,这里注意000,表示后面占用1个字节. ... 111表示占用8个字节.

1.建立测试环境:
SCOTT@test> @ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.3.0 Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

--drop table t purge;
create table t(id number , name varchar2(10), status varchar2(1));
insert into t select rownum-1 id,dbms_random.string('X',10) c20,decode(mod((rownum-1),8),0,'Y','N') c1 from dual connect by levelcommit ;
create bitmap index ib_t_status on t(status);

SCOTT@test> select owner,segment_name,header_file,header_block from dba_segments where owner=user and segment_name='IB_T_STATUS';

OWNER SEGMENT_NAME HEADER_FILE HEADER_BLOCK
------ -------------------- ----------- ------------
SCOTT IB_T_STATUS 4 530

SCOTT@test> select rowid,t.* from t where rownumROWID                      ID NAME                 S
------------------ ---------- -------------------- -
AABI+0AAEAAAACnAAA          0 1DDKXFS06B           Y
AABI+0AAEAAAACnAAB          1 MD5WQEJ31X           N
AABI+0AAEAAAACnAAC          2 UEPVVKFJR0           N
AABI+0AAEAAAACnAAD          3 X5AYXHD6YH           N
AABI+0AAEAAAACnAAE          4 1UX5F7CB7O           N

SCOTT@test> @lookup_rowid AABI+0AAEAAAACnAAA
OBJECT FILE BLOCK ROW DBA TEXT
---------- ---------- ---------- ---------- -------------------- ----------------------------------------
298932 4 167 0 4,167 alter system dump datafile 4 block 167 ;

--注:我每次删除除了data_OBJECT_id发生变化,其它基本不变.

row#0[8003] flag: ------, lock: 0, len=29
col 0; len 1; (1): 4e
col 1; len 6; (6): 01 00 00 a7 00 00
col 2; len 6; (6): 01 00 00 a7 00 47
col 3; len 10; (10): cf fe fe fe fe fe fe fe fe 01
row#1[7974] flag: ------, lock: 0, len=29
col 0; len 1; (1): 59
col 1; len 6; (6): 01 00 00 a7 00 00
col 2; len 6; (6): 01 00 00 a7 00 47
col 3; len 10; (10): cf 01 01 01 01 01 01 01 01 00
----- end of leaf block dump -----
End dump data blocks tsn: 4 file#: 4 minblk 531 maxblk 531

--0x47=71,我仅仅插入66条,可以发现结束rowid是补齐8位整除的.
--看看键值status='Y'的col3:,其它前面已经介绍,这里不在说明了.
cf 01 01 01 01 01 01 01 01 00

cf 拆开二进制 11 001 111
-- 11 大于192.表示Multi-Byte Groups
-- 001 表示 0个字节0
-- 111 表示8个字节长度.我的例子是0,8,16..的位置出现.
-- 注意后面还有1个00,这个前面讲的单字节组,表示10000000,也就是row=64的记录是status.
-- 从另外一个侧面也说明单字节组与多字节组可以混合编码的.

SCOTT@test> select * from t where status='Y';
        ID NAME                 S
---------- -------------------- -
         0 1DDKXFS06B           Y
         8 9FKXWZ29S1           Y
        16 S3A7A96RJD           Y
        24 J30XPHSFIN           Y
        32 N7DBOGUDNZ           Y
        40 0QDGDKADFY           Y
        48 09VWEEWBJN           Y
        56 2HB0AV0TX2           Y
        64 NTO2X2PP12           Y
9 rows selected.

2.做1个修改测试:
update t set status='N' where id=64;
commit ;
alter system checkpoint ;
alter system dump datafile 4 block 531 ;

row#0[7944] flag: ------, lock: 2, len=30
col 0; len 1; (1): 4e
col 1; len 6; (6): 01 00 00 a7 00 00
col 2; len 6; (6): 01 00 00 a7 00 47
col 3; len 11; (11): cf fe fe fe fe fe fe fe fe c8 03
row#1[7915] flag: -----R, lock: 2, len=28, rsl=1
col 0; len 1; (1): 59
col 1; len 6; (6): 01 00 00 a7 00 00
col 2; len 6; (6): 01 00 00 a7 00 3f
col 3; len 9; (9): cf 01 01 01 01 01 01 01 01
----- end of leaf block dump -----
End dump data blocks tsn: 4 file#: 4 minblk 531 maxblk 531

-- 可以发现行地址发生了变化,col3长度发生了变化,减少1个字节.注意看还改变了结束rowid.

3.看看自己能否造一个col3的值是否正确.

DA 03 04 05

--DA 拆开
--11 011 010
-- 11 大于192.表示Multi-Byte Groups
-- 011 表示 2个字节 0
-- 010 表示3个字节长度.

--也就是前面16个0, 00000011 00000100 00000101
--drop table t purge;
create table t(id number , name varchar2(10), status varchar2(1));
insert into t select rownum-1 id,dbms_random.string('X',10) c20,'N' c1 from dual connect by levelcommit ;

--注意如果前面开始是8个0的话,开始rowid不会是0000 (开始rowid的后2个字节),从id=24(8_2*8=24)开始构造.

update t set status='Y' where id=0;
update t set status='Y' where id=24;
update t set status='Y' where id=25;
update t set status='Y' where id=34;
update t set status='Y' where id=40;
update t set status='Y' where id=42;
commit ;

create bitmap index ib_t_status on t(status);

alter system checkpoint ;
alter system dump datafile 4 block 531 ;

row#0[8002] flag: ------, lock: 0, len=30
col 0; len 1; (1): 4e
col 1; len 6; (6): 01 00 00 a7 00 00
col 2; len 6; (6): 01 00 00 a7 00 47
col 3; len 11; (11): cf fe ff ff fc fb fa ff ff c8 03
row#1[7978] flag: ------, lock: 0, len=24
col 0; len 1; (1): 59
col 1; len 6; (6): 01 00 00 a7 00 00
col 2; len 6; (6): 01 00 00 a7 00 2f
col 3; len 5; (5): 00 da 03 04 05
----- end of leaf block dump -----
End dump data blocks tsn: 4 file#: 4 minblk 531 maxblk 531

--这里再次验证我的猜测是正确的.
--再来展开status='N'的情况:

col 3; len 11; (11): cf fe ff ff fc fb fa ff ff c8 03

-- cf 拆开
-- 11 001 111
-- 11 大于192.表示Multi-Byte Groups
-- 001 表示 0个字节 0
-- 111 表示8个字节长度.
   fe       ff       ff       fc        fb        fa        ff       ff
   11111110 11111111 11111111 11111100 11111011 11111010 11111111 11111111
--为了对着方便我左右置换看看.
   01111111 11111111 11111111 00111111 11011111 01011111 11111111 11111111
--再连在一起.
0111111111111111111111110011111111011111010111111111111111111111
--简单检查使用vim移到开头,执行24l,看看是否可以定位的是0.

--还没有完成.剩下C803
-- c8 拆开
-- 11 001 000
-- 11 大于192.表示Multi-Byte Groups
-- 001 表示 0个字节 0
-- 000 表示1个字节长度.
00000011

--再从右往左看,前面已经占了8*8=64位,这样剩下的11就是对用id=65,66的记录了.

--我这里仅仅讲解了在1个数据块的情况,如果开始rowid与结束rowid在不同数据块的情况呢?自己再做一些验证.
--且看下一篇blog.也许不再写了^_^.

--实际上我写第1篇时一直有一个疑问:
create table t(id number , name varchar2(10), status varchar2(1));
insert into t select rownum-1 id,dbms_random.string('X',10) c20,decode(mod((rownum-1),8),0,'Y','N') c1 from dual connect by levelcommit ;
create bitmap index ib_t_status on t(status);

--status='Y'的col3对应的是
col 3; len 9; (9): cf 01 01 01 01 01 01 01 01
--为什么不使用单字节组 00 00 00 00 00 00 00 00 表示,而使用多字节组来表示,后面的编码更短,难道不能全是0吗?

--再做一个测试:
--drop table t purge;
create table t(id number , name varchar2(10), status varchar2(1));
insert into t select rownum-1 id,dbms_random.string('X',10) c20,decode(mod((rownum-1),8),1,'Y','N') c1 from dual connect by levelcommit ;
SCOTT@test> select * from t where status='Y';
        ID NAME                 S
---------- -------------------- -
         1 1H2YD3EUJG           Y
         9 LTHF8GKDUP           Y
        17 0XK5EGWLZF           Y
        25 PONO8HR4KX           Y
        33 OVEF0K35C8           Y
        41 L23DBSZ5IC           Y
        49 DK18XVX8J4           Y
        57 IB6XNM5Z8W           Y
8 rows selected.

create bitmap index ib_t_status on t(status);
alter system checkpoint ;
alter system dump datafile 4 block 531 ;

row#0[8004] flag: ------, lock: 0, len=28
col 0; len 1; (1): 4e
col 1; len 6; (6): 01 00 00 a7 00 00
col 2; len 6; (6): 01 00 00 a7 00 3f
col 3; len 9; (9): cf fd fd fd fd fd fd fd fd
row#1[7976] flag: ------, lock: 0, len=28
col 0; len 1; (1): 59
col 1; len 6; (6): 01 00 00 a7 00 00
col 2; len 6; (6): 01 00 00 a7 00 3f
col 3; len 9; (9): cf 02 02 02 02 02 02 02 02

-- 搞不懂oracle内部如何处理,为什么选择这种编码模式.....

时间： 2024-10-23 10:28:30

[20150204]关于位图索引3.txt

[20150204]关于位图索引3.txt的相关文章

[20150204]关于位图索引5.txt

[20150204]关于位图索引4.txt

[20150205]关于位图索引6.txt

[20150205]关于位图索引7.txt

[20150203]关于位图索引1.txt

[20150203]关于位图索引2.txt

[20130729]位图索引与死锁.txt

[20140402]关于位图索引的统计信息

python-关于mysql中的位图索引和位片索引问题