[20111223]索引键值在B tree索引块中的顺序.txt
参考链接:http://www.adellera.it/blog/2009/05/24/order-keys-inside-index-blocks/
自己为了加强理解重复一下对方的测试!
1.建立测试表以及索引
SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production PL/SQL Release 11.2.0.1.0 - Production CORE 11.2.0.1.0 Production TNS for Linux: Version 11.2.0.1.0 - Production NLSRTL Version 11.2.0.1.0 - Production SQL> create table t (x varchar2(10)); 随机插入如下记录: insert into t values('000000'); |
2.转储对应的索引块:
SQL> select header_file,header_block from dba_segments where wner='SCOTT' and segment_name='I_T_X';
HEADER_FILE HEADER_BLOCK SQL> alter system dump datafile 4 block 11299; |
3.查看:
Block header dump: 0x01002c23 Object id on Block? Y seg/obj: 0x12afc csc: 0x00.3d4076 itc: 2 flg: E typ: 2 - INDEX brn: 0 bdba: 0x1002c20 ver: 0x01 opc: 0 inc: 0 exflg: 0 Itl Xid Uba Flag Lck Scn/Fsc |
注意看row#0 括号的数字,表示该键值的索引块的位置,col0是索引键值,col1是rowid。
插入顺序: '000000' '777777' '111111' '666666' '222222' '555555' '333333' '444444' 在块中位置: 8016 8000 7984 7968 7952 7936 7920 7904 可以看出索引在块中插入并不是安装键值排序的,而是从尾部按照插入顺序插入的。 |
4.在"row directory"中记录键值的具体位置:
Start dump data blocks tsn: 4 file#:4 minblk 11299 maxblk 11299 Block dump from cache: Dump of buffer cache at level 4 for tsn=4, rdba=16788515 BH (0xa8ff5fd8) file#: 4 rdba: 0x01002c23 (4/11299) class: 1 ba: 0xa8f2a000 set: 6 pool 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 186,19 dbwrid: 0 obj: 76540 objn: 76540 tsn: 4 afn: 4 hint: f hash: [0xbe252cb0,0xbe252cb0] lru: [0xa8ff61f0,0xa8ff5f90] ckptq: [NULL] fileq: [NULL] objq: [0xa8ff6218,0xba80aa40] st: XCURRENT md: NULL tch: 2 flags: block_written_once redo_since_read LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [1] cr pin refcnt: 0 sh pin refcnt: 0 Block dump from disk: buffer tsn: 4 rdba: 0x01002c23 (4/11299) scn: 0x0000.003d4081 seq: 0x01 flg: 0x06 tail: 0x40810601 frmt: 0x02 chkval: 0xbf9f type: 0x06=trans data Hex dump of block: st=0, typ_found=1 Dump of memory from 0x00002AEB5C920A00 to 0x00002AEB5C922A00 2AEB5C920A00 0000A206 01002C23 003D4081 06010000 [....#,...@=.....] 2AEB5C920A10 0000BF9F 00000002 00012AFC 003D4076 [.........*..v@=.] 2AEB5C920A20 00000000 00320002 01002C20 00000000 [......2. ,......] 2AEB5C920A30 00000000 00000000 00000000 00000000 [................] 2AEB5C920A40 00000000 00010035 00000007 00C00AE0 [....5...........] 2AEB5C920A50 001A000E 00002008 003D4081 00000000 [..... ...@=.....] 2AEB5C920A60 00000000 02800000 00000000 00340008 [..............4.] 2AEB5C920A70 1EAC1EE0 00000000 00000000 00000000 [................] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2AEB5C920A80 00000000 00001F60 1F301F50 1EF01F10 [....`...P.0.....] 2AEB5C920A90 1F001EE0 1F401F20 00000000 00000000 [.... .@.........] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2AEB5C920AA0 00000000 00000000 00000000 00000000 [................] Repeat 489 times 2AEB5C922940 00000000 34060200 34343434 00010634 [.......444444...] 2AEB5C922950 07000E05 33060200 33333333 00010633 [.......333333...] 2AEB5C922960 06000E05 35060200 35353535 00010635 [.......555555...] 2AEB5C922970 05000E05 32060200 32323232 00010632 [.......222222...] 2AEB5C922980 04000E05 36060200 36363636 00010636 [.......666666...] 2AEB5C922990 03000E05 31060200 31313131 00010631 [.......111111...] 2AEB5C9229A0 02000E05 37060200 37373737 00010637 [.......777777...] 2AEB5C9229B0 01000E05 30060200 30303030 00010630 [.......000000...] 2AEB5C9229C0 00000E05 00000000 00000000 00000000 [................] 2AEB5C9229D0 00000000 00000000 00000000 00000000 [................] Repeat 1 times 2AEB5C9229F0 00000000 00000000 00000000 40810601 [...............@] kdxlebksz 8032,从row directory中确定如下: 插入顺序: '000000' '777777' '111111' '666666' '222222' '555555' '333333' '444444' 1F60=8032 end of "bottom heap" (kdxlebksz), also 8016 + 16 1F30=7984 '111111' |
-- 摘抄链接,不自己写了,以免出现不必要的错误:http://www.adellera.it/blog/2009/05/24/order-keys-inside-index-blocks/
that is, the addresses in the row directory are ordered by the (binary) value of the key they point to - remember that you must swap the high and low 2-byte quantities in each 32-bit word (the usual little-endian ordering; I haven't checked whether the results would be different on a machine that is not little-endian, as my one - an x86 - is).
如果交换字节顺序,可以确定按照以上地址顺序排列的。说明在索引插入时,键值从底部插入,而排序仅仅需要修改这个row directory就ok了。这样插入效率比较高。
The reason for this ordering is to improve the efficiency of typical visits to the index block. Consider a range scan, when the kernel has to find all the index keys contained in the search interval [min_extreme, max_extreme] (possibly min_extreme=max_extreme for equality predicates). As it is well known, the kernel first walks the branch blocks to identify the leaf block that contains min_extreme (or max_extreme, but that would be the same for our discussion); then, it has to identify all the keys that are >= min_extreme and = min_extreme, and then walk the next entries in the row directory until it gets to the last one that is
The advantage is huge, since visiting all the keys is a O( N ) operation, while a binary search has only O ( log(2,N) ) complexity.
To fully appreciate this on a concrete example, consider that each key of our example needs 16+2 bytes of storage, hence a perfectly packed 8K block might contain approximately 8000 / 18 = 444 keys. Thanks to the ordering, for the very frequent equality search (classic for Primary Key searches for example), the processing consists essentially of the binary search only - hence the number of keys to be considered are down to about ceil( log(2, 444) ) = 9, thus consuming only 9/444 = 2%
(!!) of the resources.
It is also worth remembering that this way, only a portion of the block is accessed, thus needing less accesses to central memory (it is unlikely in fact that the whole 8K can be found in the processor's caches), thus reducing the elapsed time considerably since stalling for central memory is a very common bottleneck in database systems - and thus improving scalability for the whole system thanks to the reduction of the traffic on the memory bus.
Of course there's a price to be paid for modifications, since each modification has to keep the row directory ordered, thus shifting the slots of the row directory, which is a relatively expensive operation. Oracle pays an higher bill at modification time to save (hugely) at read time.
--modify 代价很大!
As a final observation: note that the branch block visit can be considered a binary search as well, "pre-calculated" by the B+tree structure - the ordering inside a block is just the same divide-and-conquer algorithm applied in a different way.