Greenplum列存压缩表索引机制

列存压缩表，简称AOCS表

数据生成

create table testao(date text, time text, open float, high float,                                                                                                                           low float, volume int) with(APPENDONLY=true,ORIENTATION=column);

create index testao_idx on testao using btree (volume);

insert into testao select t, t, t, t, t, t from generate_series(1, 1000000) as t;

现象

执行计划如下：

postgres=> explain select * from testao where volume = 100 limit 1;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Limit  (cost=100.95..200.98 rows=1 width=40)
   ->  Gather Motion 4:1  (slice1; segments: 4)  (cost=100.95..200.98 rows=1 width=40)
         ->  Limit  (cost=100.95..200.96 rows=1 width=40)
               ->  Bitmap Append-Only Column-Oriented Scan on testao  (cost=100.95..200.96 rows=1 width=40)
                     Recheck Cond: volume = 100
                     ->  Bitmap Index Scan on testao_idx  (cost=0.00..100.95 rows=1 width=0)
                           Index Cond: volume = 100
 Settings:  effective_cache_size=8GB; gp_statistics_use_fkeys=on
 Optimizer status: legacy query optimizer
(9 rows)

我们看到使用Bitmap Index Scan索引扫描

如何通过索引找到数据

索引页包含记录的tid，而tid包含segfileno和rownum信息，通过segfileno可以定位到文件，通过rownum可以定位到block及具体值。

如何通过rownum快速定位到block

对于索引，GP将会创建一个pg_aoblkdi_oid辅助表(block directory)，里面包含每个block在文件的偏移位置fileOffset、segfileno、firstRowNum，并在firstRowNum列上创建索引，只要给出一个rownum，通过索引在pg_aoblkdi_oid辅助表中可以快速得到block在文件的偏移位置fileOffset，然后取出数据。

扫描方式的选择

为什么AOCS表使用的索引方法是Bitmap Index Scan，而不是我们常见的Index Scan呢？

AO表的扫描方向只能从前往后，而不能从后往前，heap表从前往后、从后往前都是支持的。通过索引找到的数据在AO文件位置并不是从前往后顺序的。如图所示，假设我们的条件是id<=7，通过索引找到的记录的顺序是1,3,5,7。如果是Index Scan，那么就要先从fileOffset位置扫描到第三个位置找到value=1，然后继续扫描到第四个位置value=3，然后继续从fileOffset位置开始扫描第一个位置value=5，继续扫描到第二个位置value=7，可以看到使用Index Scan可能会有多次回头重新开始扫描，增加了IO。为了避免这个问题，只使用Bitmap Index Scan，将会先扫描所有满足索引的值，然后按照tid排序，按照rownum从小到大扫描，一次从前往后扫描就可以得到索引对应的值了。

时间： 2024-09-23 08:49:37

Greenplum列存压缩表索引机制

数据生成

现象

如何通过索引找到数据

如何通过rownum快速定位到block

扫描方式的选择

Greenplum列存压缩表索引机制的相关文章

Greenplum列存压缩表事务机制

Greenplum列存压缩表原理

阿里云HybridDB for PG实践 - 行存、列存，堆表、AO表的原理和选择

PostgreSQL 如何让列存（外部列存）并行起来

行存、列存，堆表、AO表性能对比 - 阿里云HDB for PostgreSQL最佳实践

HybridDB for PostgreSQL 列存表(AO表)的膨胀、垃圾检查与空间收缩

Greenplum行存与列存的选择以及转换方法

倒排与列存

MySQL · 引擎特性 · Infobright 列存数据库