PostgreSQL Daily Maintenance - reindex

Reindex日志维护

随着DML的发生, 索引会出现碎片.  持续膨胀.  降低效率. 虽然PostgreSQL有vacuum机制, 但是索引不像heap表, vacuum后的dead tuple占用的空间可以被马上回收复用, 以b-tree索引为例, 一个b-tree page只有当所有的item全部变成不可用后这个page才能被复用. 所以索引膨胀的概率比表大很多.

例一 :

digoal=# truncate tbl;
TRUNCATE TABLE
Time: 1070.523 ms
digoal=# insert into tbl select generate_series(1,1000000),'test';
INSERT 0 1000000
Time: 4534.320 ms
digoal=# analyze tbl;
ANALYZE
Time: 58.228 ms
digoal=# select pg_relation_size('tbl');
 pg_relation_size
------------------
         44285952
(1 row)
Time: 0.371 ms
digoal=# select pg_relation_size('idx_tbl_1');
 pg_relation_size
------------------
         22487040
(1 row)
Time: 0.304 ms
digoal=# update tbl set id=id+1000000;
UPDATE 1000000
Time: 5786.052 ms
digoal=# select pg_relation_size('tbl');
 pg_relation_size
------------------
         88563712
(1 row)
Time: 0.281 ms
digoal=# select pg_relation_size('idx_tbl_1');
 pg_relation_size
------------------
         44941312
(1 row)
Time: 0.308 ms

第一次全量更新后, 表和索引都膨胀了1倍.

digoal=# vacuum tbl;
VACUUM
Time: 17.052 ms

vacuum后表的垃圾会回收, 但是索引的不会被回收.

所以第二次全量更新, 表不会再膨胀了, 但是索引继续膨胀.

digoal=# update tbl set id=id-1000000;
UPDATE 1000000
Time: 6063.763 ms
digoal=# select pg_relation_size('tbl');
 pg_relation_size
------------------
         88563712
(1 row)
Time: 0.814 ms
digoal=# select pg_relation_size('idx_tbl_1');
 pg_relation_size
------------------
         65585152
(1 row)
Time: 0.353 ms

第三次vacuum与第二次类似.

digoal=# vacuum tbl;
VACUUM
Time: 2111.606 ms

全量更新后, 表未膨胀, 索引膨胀.

digoal=# update tbl set id=id+1000000;
UPDATE 1000000
Time: 7829.064 ms
digoal=# select pg_relation_size('tbl');
 pg_relation_size
------------------
         88563712
(1 row)
Time: 0.310 ms
digoal=# select pg_relation_size('idx_tbl_1');
 pg_relation_size
------------------
         88907776
(1 row)
Time: 0.397 ms

例二 : 

创建表, 索引, 插入500万测试数据 :

digoal=# create table tbl(id int primary key, info int);
CREATE TABLE
digoal=# insert into tbl select generate_series(1,5000000),1;
INSERT 0 5000000
digoal=# create index idx_tbl_1 on tbl(info);
CREATE INDEX
digoal=# vacuum analyze tbl;
VACUUM
记录当前的表, 索引的大小.
digoal=# select pg_relation_size('tbl');
 pg_relation_size
------------------
        181239808
(1 row)
digoal=# select pg_relation_size('tbl_pkey');
 pg_relation_size
------------------
        112328704
(1 row)
digoal=# select pg_relation_size('idx_tbl_1');
 pg_relation_size
------------------
        112336896
(1 row)

使用pgbench对这个表做更新操作 : 

pg93@db-172-16-3-33-> vi update.sql
\setrandom id 1 5000000
update tbl set info=trunc(5000000*random()) where id=:id;

pg93@db-172-16-3-33-> pgbench -M prepared -r -n -c 8 -j 2 -f ./update.sql -T 60
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 2
duration: 60 s
number of transactions actually processed: 361126
tps = 5994.525604 (including connections establishing)
tps = 5995.828525 (excluding connections establishing)
statement latencies in milliseconds:
        0.001356        \setrandom id 1 5000000
        1.331439        update tbl set info=trunc(5000000*random()) where id=:id;

第一批更新后对表做vacuum, 回收dead tuple占用的空间 : 

digoal=# vacuum verbose analyze tbl;
INFO:  vacuuming "public.tbl"
INFO:  scanned index "tbl_pkey" to remove 361126 row versions
DETAIL:  CPU 0.08s/1.54u sec elapsed 1.74 sec.
INFO:  scanned index "idx_tbl_1" to remove 361126 row versions
DETAIL:  CPU 0.20s/1.88u sec elapsed 4.00 sec.
INFO:  "tbl": removed 361126 row versions in 23350 pages
DETAIL:  CPU 0.14s/0.34u sec elapsed 1.13 sec.
INFO:  index "tbl_pkey" now contains 5000000 row versions in 13761 pages
DETAIL:  361082 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_tbl_1" now contains 5000000 row versions in 14972 pages
DETAIL:  361126 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "tbl": found 300838 removable, 5000000 nonremovable row versions in 23459 out of 23459 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.62s/4.34u sec elapsed 7.77 sec.
INFO:  analyzing "public.tbl"
INFO:  "tbl": scanned 23459 of 23459 pages, containing 5000000 live rows and 0 dead rows; 30000 rows in sample, 5000000 estimated total rows
VACUUM

查看当前表的大小和索引的大小, 比创建表时大了一些. tbl_pkey没变大多少是因为HOT机制造成的. 

HOT的详细介绍可以参考src/backend/access/heap/README.HOT

digoal=# select pg_relation_size('tbl');
 pg_relation_size
------------------
        192176128
(1 row)
digoal=# select pg_relation_size('tbl_pkey');
 pg_relation_size
------------------
        112730112
(1 row)
digoal=# select pg_relation_size('idx_tbl_1');
 pg_relation_size
------------------
        122650624
(1 row)

再次使用pgbench对测试表其进行更新 : 

pg93@db-172-16-3-33-> pgbench -M prepared -r -n -c 8 -j 2 -f ./update.sql -T 60
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 2
duration: 60 s
number of transactions actually processed: 417661
tps = 6960.793225 (including connections establishing)
tps = 6962.199893 (excluding connections establishing)
statement latencies in milliseconds:
        0.001263        \setrandom id 1 5000000
        1.146337        update tbl set info=trunc(5000000*random()) where id=:id;

再次执行vacuum,

digoal=# vacuum verbose analyze tbl;
INFO:  vacuuming "public.tbl"
INFO:  scanned index "tbl_pkey" to remove 417660 row versions
DETAIL:  CPU 0.09s/1.64u sec elapsed 1.88 sec.
INFO:  scanned index "idx_tbl_1" to remove 417660 row versions
DETAIL:  CPU 0.24s/1.96u sec elapsed 3.89 sec.
INFO:  "tbl": removed 417660 row versions in 23635 pages
DETAIL:  CPU 0.41s/1.28u sec elapsed 5.38 sec.
INFO:  index "tbl_pkey" now contains 5000000 row versions in 14191 pages
DETAIL:  417335 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_tbl_1" now contains 5000000 row versions in 16433 pages
DETAIL:  417660 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "tbl": found 361383 removable, 5000000 nonremovable row versions in 23727 out of 23727 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 23313 unused item pointers.
0 pages are entirely empty.
CPU 0.93s/5.76u sec elapsed 12.46 sec.
INFO:  analyzing "public.tbl"
INFO:  "tbl": scanned 23727 of 23727 pages, containing 5000000 live rows and 0 dead rows; 30000 rows in sample, 5000000 estimated total rows
VACUUM

当前表的膨胀几乎停止, 因为第一次vacuum后回收了dead tuple的空间.

digoal=# select pg_relation_size('tbl');
 pg_relation_size
------------------
        194371584
(1 row)

digoal=# select pg_relation_size('tbl_pkey');
 pg_relation_size
------------------
        116252672
(1 row)

被更新列的索引继续膨胀, 因为索引页的复用需要这个页上的tuple itempoint完全失效后才可以.

digoal=# select pg_relation_size('idx_tbl_1');
 pg_relation_size
------------------
        134619136
(1 row)

所以要经常给索引瘦身, 同时又不能影响数据库的DML操作.

可以使用如下方法, 重建例一中的索引 :

digoal=# \d tbl
      Table "public.tbl"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer |
 info   | text    |
Indexes:
    "idx_tbl_1" btree (id)

创建索引的同时不影响表的 DML操作.

digoal=# create index concurrently  idx_tbl_2 on tbl(id);
CREATE INDEX
Time: 2599.077 ms

创建好后删除索引1.

digoal=# drop index idx_tbl_1;
DROP INDEX
Time: 21.757 ms

新建的索引又瘦下去了.

digoal=# select pg_relation_size('idx_tbl_2');
 pg_relation_size
------------------
         22487040
(1 row)
Time: 0.450 ms

对于primary key或者unique key也可以使用此方法.

digoal=# create unique index concurrently user_info_username_key_1 on user_info(username);
CREATE INDEX
digoal=# begin;
BEGIN
digoal=# alter table user_info drop constraint user_info_username_key;
ALTER TABLE
digoal=# alter table user_info add constraint user_info_username_key unique using index user_info_username_key_1;
NOTICE:  ALTER TABLE / ADD CONSTRAINT USING INDEX will rename index "user_info_username_key_1" to "user_info_username_key"
ALTER TABLE
digoal=# end;
COMMIT
digoal=# create unique index concurrently user_info_pkey_1 on user_info(id);
CREATE INDEX
digoal=# begin;
BEGIN
digoal=# alter table user_info drop constraint user_info_pkey;
ALTER TABLE
digoal=# alter table user_info add constraint user_info_pkey primary key using index user_info_pkey_1;
NOTICE:  ALTER TABLE / ADD CONSTRAINT USING INDEX will rename index "user_info_pkey_1" to "user_info_pkey"
ALTER TABLE
digoal=# end;
COMMIT
时间: 2024-12-25 11:32:22

PostgreSQL Daily Maintenance - reindex的相关文章

PostgreSQL Daily Maintenance - vacuum

PostgreSQL数据库日常维护需要维护哪些东西, 和数据库中的业务类型有莫大的关系. PostgreSQL的并发控制简单来说是通过多tuple版本, tuple infomask信息, 事务提交状态以及事务snapshot来实现的. 当删除一条记录时, 并不是马上回收被删除的空间, 因为有可能其他事务还会用到它, 当更新一条记录是, 老的记录会保留, 然后插入新的记录. 例如 : digoal=# create table tbl(id int, info text); CREATE TAB

PostgreSQL Daily Maintenance - cluster table

Cluster簇管理 :  在PostgreSQL的统计信息表中, 有一项指标correlation指明了heap表存储的物理顺序和索引顺序的关系. 例如 : digoal=# create table tbl (id int primary key, info text); CREATE TABLE Time: 25.762 ms digoal=# insert into tbl select generate_series(1,10000),'test'; INSERT 0 10000 Ti

PostgreSQL 时序数据案例 - 时间流逝, 自动压缩, 同比\环比

标签 PostgreSQL , 时序数据 , rrd , rrdtool , round robin database , 自动压缩 , CTE , dml returning , 环比 , 同比 , KNN 背景 时序数据库一个重要的特性是时间流逝压缩,例如1天前压缩为5分钟一个点,7天前压缩为30分钟一个点. PostgreSQL 压缩算法可定制.例如简单的平均值.最大值.最小值压缩,或者基于旋转门压缩算法的压缩. <[未完待续] SQL流式案例 - 旋转门压缩(前后计算相关滑窗处理例子)>

如何根据行号高效率的清除过期数据 - 非分区表,数据老化实践

标签 PostgreSQL , 数据老化 , 数据过期 , 行号 , array in , oss外部表 背景 数据按时间维度老化,删除或转移,是很多业务都有的需求. 例如业务的FEED数据,CDN的日志数据,物联网的跟踪数据等,有时间维度,可能再有状态值(标记最终状态). 阿里云RDS PG, HDB PG都对接了OSS存储,可以在OSS中存储冷数据. 我们可以将老化数据直接删除,也可以将老化数据删除并写入OSS外部表. 而假如我们的表是按老化字段分区的,那么我们可以通过DROP 分区表的方式

超时流式处理 - 没有消息流入的数据异常监控

标签 PostgreSQL , 流式处理 , 无流入数据超时异常 背景 流计算有个特点,数据流式写入,流式计算. 但是有一种情况,可能无法覆盖.例如电商中的 收货超时,退款处理超时 事件的流式监控.因为数据都不会再写进来了,所以也无法触发流式计算. 这些问题如何流式预警呢? 可以用超时时间+调度的方式,当然这里面有PostgreSQL的独门秘籍: 1.CTE,语法灵活. 2.partial index,不需要检索的数据不构建索引. 3.DML returning,可以返回DML语句的结果,结合C

99.2. Web

99.2.1. Apache Log 1.查看当天有多少个IP访问: awk '{print $1}' log_file|sort|uniq|wc -l 2.查看某一个页面被访问的次数: grep "/index.php" log_file | wc -l 3.查看每一个IP访问了多少个页面: awk '{++S[$1]} END {for (a in S) print a,S[a]}' log_file 4.将每个IP访问的页面数进行从小到大排序: awk '{++S[$1]} EN

23.2. Web

23.2.1. Apache Log 1.查看当天有多少个IP访问: awk '{print $1}' log_file|sort|uniq|wc -l 2.查看某一个页面被访问的次数: grep "/index.php" log_file | wc -l 3.查看每一个IP访问了多少个页面: awk '{++S[$1]} END {for (a in S) print a,S[a]}' log_file 4.将每个IP访问的页面数进行从小到大排序: awk '{++S[$1]} EN

PostgreSQL、Greenplum 日常监控 和 维护任务

标签 PostgreSQL , Greenplum , Recommended Monitoring and Maintenance Tasks , 监控 , 维护 背景 Greenplum的日常监控点.评判标准,日常维护任务. 展示图层 由于一台主机可能跑多个实例,建议分层展示. 另外,即使是ON ECS虚拟机(一个虚拟机一个实例一对一的形态)的产品形态,实际上也建议分层展示,以示通用性. 主机级图层 1.全局 2.以集群分组 展示图形 1.饼图(正常.警告.严重错误.不可用,占比,数量) 2

一天学会PostgreSQL应用开发与管理 - 8 PostgreSQL 管理

本章大纲 一.权限体系 1 逻辑结构 2 权限体系 3 schema使用 , 特别注意 4 用户 5 public 6 如何查看和解读一个对象的当前权限状态 二.索引介绍 1 索引有什么用? 2 索引的类型 3 索引合并扫描 4 表膨胀检查 5 检查膨胀 6 索引维护 三.系统配置 1 存储.文件系统规划 2 网络规划 3 CPU评估 4 内核配置 5 资源限制 6 防火墙配置 四.数据库初始化 1 initdb 介绍 2 postgresql.conf参数配置 3 pg_hba.conf数据库