PostgreSQL 9.6 vacuum freeze大幅性能提升 代码浅析

PostgreSQL 9.6 vacuum freeze大幅性能提升 代码浅析

作者

digoal

日期

2016-10-02

标签

PostgreSQL , 9.6 , vacuum freeze , visibility map , skip frozen page


背景

PostgreSQL的tuple(即记录)头信息中有两个字段分别为XMIN,XMAX用于标记行产生与变更的事务号,以标示记录的版本号,事务的可见性等。

这个事务号是32BIT的长度,因此PG设计了一个事务存活的最长时间是约20亿,如果超过20亿必须将这个事务置为frozen。

被置为frozen状态的记录,对所有的事务可见,从而解决了32BIT的XID可以用于实现MVCC的目的。

因此PostgreSQL 需要周期性的对表进行扫描,检查是否需要将记录置为frozen。

PostgreSQL 9.4以前的版本,FROZEN是通过一个等于2的XID来表示的。

从9.4开始改成了通过tuple 头部的t_infomask中的两个互斥的比特位来表示HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID。

参见
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=37484ad2aacef5ec794f4dd3d5cf814475180a78

有些插入操作,也可以直接将记录置为freeze,例如大批量的COPY数据。 insert into等。

src/include/access/heapam.h

/* "options" flag bits for heap_insert */
#define HEAP_INSERT_SKIP_WAL    0x0001
#define HEAP_INSERT_SKIP_FSM    0x0002
#define HEAP_INSERT_FROZEN              0x0004
#define HEAP_INSERT_SPECULATIVE 0x0008

表的全局年龄则记录在pg_class中,即使表没有任何变化,在年龄到达一定的值(参数配置)后,也需要发起frozen的动作,对表的记录进行扫描。

9.6对这块做了改进,当数据页中的所有记录已经是FROZEN状态时,在发起vacuum freeze时会跳过这个页的扫描,从而大幅提升静态数据的freeze操作,减少IO扫描。

目前这个page frozen标记放在表对应的VM文件中。

vacuum freeze改进代码分析

1. 如何将记录标记为freeze状态

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=37484ad2aacef5ec794f4dd3d5cf814475180a78

Change the way we mark tuples as frozen.

Instead of changing the tuple xmin to FrozenTransactionId, the combination
of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never
set together, is now defined as HEAP_XMIN_FROZEN.  A variety of previous
proposals to freeze tuples opportunistically before vacuum_freeze_min_age
is reached have foundered on the objection that replacing xmin by
FrozenTransactionId might hinder debugging efforts when things in this
area go awry; this patch is intended to solve that problem by keeping
the XID around (but largely ignoring the value to which it is set).

Third-party code that checks for HEAP_XMIN_INVALID on tuples where
HEAP_XMIN_COMMITTED might be set will be broken by this change.  To fix,
use the new accessor macros in htup_details.h rather than consulting the
bits directly.  HeapTupleHeaderGetXmin has been modified to return
FrozenTransactionId when the infomask bits indicate that the tuple is
frozen; use HeapTupleHeaderGetRawXmin when you already know that the
tuple isn't marked commited or frozen, or want the raw value anyway.
We currently do this in routines that display the xmin for user consumption,
in tqual.c where it's known to be safe and important for the avoidance of
extra cycles, and in the function-caching code for various procedural
languages, which shouldn't invalidate the cache just because the tuple
gets frozen.

Robert Haas and Andres Freund

src/include/access/htup_details.h

/*
 * information stored in t_infomask:
 */
#define HEAP_XMIN_FROZEN                (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)

2. 9.6在VM文件中,新增了对FROZEN页的标记,当整页数据都是静态的时,并且都已经是freeze状态,则该页标记为FROZEN,在执行vacuum freeze或自动触发freeze时,就会跳过这个页的扫描。

src/backend/access/heap/visibilitymap.c

 * The visibility map is a bitmap with two bits (all-visible and all-frozen)
 * per heap page. A set all-visible bit means that all tuples on the page are
 * known visible to all transactions, and therefore the page doesn't need to
 * be vacuumed. A set all-frozen bit means that all tuples on the page are
 * completely frozen, and therefore the page doesn't need to be vacuumed even
 * if whole table scanning vacuum is required (e.g. anti-wraparound vacuum).
 * The all-frozen bit must be set only when the page is already all-visible.

src/include/access/visibilitymap.h

/* Number of bits for one heap page */
#define BITS_PER_HEAPBLOCK 2

/* Flags for bit map */
#define VISIBILITYMAP_ALL_VISIBLE       0x01
#define VISIBILITYMAP_ALL_FROZEN        0x02
#define VISIBILITYMAP_VALID_BITS        0x03            /* OR of all valid
                                                                                                 * visiblitymap flags bits */

/* Macros for visibilitymap test */
#define VM_ALL_VISIBLE(r, b, v) \
        ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_VISIBLE) != 0)
#define VM_ALL_FROZEN(r, b, v) \
        ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_FROZEN) != 0)

3. 如果vm页损坏了,我们可以通过vacuum DISABLE_PAGE_SKIPPING强制扫描所有的页。
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ede62e56fbe809baa1a7bc3873d82f12ffe7540b

If you really want to vacuum every single page in the relation,
regardless of apparent visibility status or anything else, you can use
this option.  In previous releases, this behavior could be achieved
using VACUUM (FREEZE), but because we can now recognize all-frozen
pages as not needing to be frozen again, that no longer works.  There
should be no need for routine use of this option, but maybe bugs or
disaster recovery will necessitate its use.

postgres=# \h vacuum
Command:     VACUUM
Description: garbage-collect and optionally analyze a database
Syntax:
VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ table_name [ (column_name [, ...] ) ] ]

9.6 vs 9.5 FREEZE操作

测试用例

postgres=# create table test(id int, info text);
CREATE TABLE

插入6400万记录

$ vi test.sql
insert into test values (1,'test');

$ pgbench -M prepared -n -r -f ./test.sql -c 64 -j 64 -t 1000000

transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 64
number of transactions per client: 1000000
number of transactions actually processed: 64000000/64000000
latency average = 0.164 ms
tps = 389383.803477 (including connections establishing)
tps = 389393.063237 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.162  insert into test values (1,'test');

1. 9.6测试结果
生成vm文件

postgres=# vacuum analyze test;
VACUUM
Time: 8629.180 ms

确认VM文件已生成

postgres=# select * from pg_stat_file(pg_relation_filepath('test')||'_vm');
 size  |         access         |      modification      |         change         | creation | isdir
-------+------------------------+------------------------+------------------------+----------+-------
 90112 | 2016-10-02 13:34:24+08 | 2016-10-02 13:34:32+08 | 2016-10-02 13:34:32+08 |          | f
(1 row)

记录当前XLOG位置

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location
--------------------------
 3/87E3C58
(1 row)

执行第一次vacuum freeze

postgres=# vacuum freeze test;
VACUUM
Time: 3487.945 ms

记录XLOG位置

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location
--------------------------
 3/2E3C3F30
(1 row)

计算第一次VACUUM FREEZE产生了多少XLOG

postgres=# select pg_size_pretty(pg_xlog_location_diff('3/2E3C3F30','3/87E3C58'));
 pg_size_pretty
----------------
 604 MB
(1 row)

不产生数据变更,开始接下来的vacuum freeze。
9.6接下来的VACUUM FREEZE非常快,已经自动跳过了frozen page,并且不会产生XLOG。

postgres=# vacuum freeze test;
VACUUM
Time: 16.581 ms
postgres=# select pg_current_xlog_location();
 pg_current_xlog_location
--------------------------
 3/2E3C3F30
(1 row)
Time: 0.079 ms

postgres=# vacuum freeze test;
VACUUM
Time: 16.555 ms
postgres=# select pg_current_xlog_location();
 pg_current_xlog_location
--------------------------
 3/2E3C3F30
(1 row)
Time: 0.115 ms

postgres=# vacuum freeze verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 0 nonremovable row versions in 0 out of 345957 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_16415"
INFO:  index "pg_toast_16415_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16415": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 16.566 ms

消耗几个事务后,重新vacuum freeze依旧很快,产生非常少量的xlog(这些xlog实际上是变更test表的pg_class.relfrozenxid字段产生的)。

postgres=# select txid_current();
 txid_current
--------------
     64001925
(1 row)

postgres=# select txid_current();
 txid_current
--------------
     64001926
(1 row)

postgres=# select txid_current();
 txid_current
--------------
     64001927
(1 row)

postgres=# vacuum freeze verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 0 nonremovable row versions in 0 out of 345957 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_16415"
INFO:  index "pg_toast_16415_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16415": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 18.020 ms
postgres=# select pg_current_xlog_location();
 pg_current_xlog_location
--------------------------
 3/2E3C4130
(1 row)

postgres=# select pg_size_pretty(pg_xlog_location_diff('3/2E3C4130', '3/2E3C3F30'));
 pg_size_pretty
----------------
 512 bytes
(1 row)

使用强制vacuum freeze,不跳过froze pages.
当VM文件损坏时,可以这样使用。

postgres=# vacuum (freeze, verbose, DISABLE_PAGE_SKIPPING) test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 64000000 nonremovable row versions in 345957 out of 345957 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
2 pages are entirely empty.
CPU 0.68s/3.70u sec elapsed 4.38 sec.
INFO:  vacuuming "pg_toast.pg_toast_16415"
INFO:  index "pg_toast_16415_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16415": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 4397.821 ms

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location
--------------------------
 3/2E3C42F0
(1 row)

postgres=# select pg_size_pretty(pg_xlog_location_diff('3/2E3C42F0','3/2E3C4130'));
 pg_size_pretty
----------------
 448 bytes
(1 row)

通过测试,我们可以看到PostgreSQL 9.6在vacuum freeze这块改进非常实用,特别是当数据库很庞大,并且存储了大量的历史静态数据时,在发生vacuum freeze时,IO不会像以前那样飙升了。

2. 9.5测试结果
生成vm文件

postgres=# vacuum analyze test;
VACUUM
Time: 17441.652 ms

确认VM文件已生成

postgres=# select * from pg_stat_file(pg_relation_filepath('test')||'_vm');
 size  |         access         |      modification      |         change         | creation | isdir
-------+------------------------+------------------------+------------------------+----------+-------
 49152 | 2016-10-02 14:11:17+08 | 2016-10-02 14:11:34+08 | 2016-10-02 14:11:34+08 |          | f
(1 row)

记录当前XLOG位置

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location
--------------------------
 5/A7FD5AC8
(1 row)

执行第一次vacuum freeze

postgres=# vacuum freeze test;
VACUUM
Time: 9889.702 ms

记录XLOG位置

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location
--------------------------
 5/CCB2DAB8
(1 row)

计算第一次VACUUM FREEZE产生了多少XLOG

postgres=# select pg_size_pretty(pg_xlog_location_diff('5/CCB2DAB8', '5/A7FD5AC8'));
 pg_size_pretty
----------------
 587 MB
(1 row)

不产生数据变更,开始接下来的vacuum freeze。
9.5的版本,接下来的VACUUM FREEZE同样需要扫描全表,因为VM中未记录frozen pages。

postgres=# vacuum freeze test;
VACUUM
Time: 7191.695 ms

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location
--------------------------
 5/CCB2DAB8
(1 row)

postgres=# vacuum freeze test;
VACUUM
Time: 7159.769 ms

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location
--------------------------
 5/CCB2DAB8
(1 row)

postgres=# vacuum freeze verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 64000000 nonremovable row versions in 346248 out of 346248 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/7.18u sec elapsed 7.19 sec.
INFO:  vacuuming "pg_toast.pg_toast_16682"
INFO:  index "pg_toast_16682_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16682": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 7206.649 ms

消耗几个事务后,重新vacuum freeze。

postgres=# select txid_current();
 txid_current
--------------
    128001924
(1 row)

postgres=# select txid_current();
 txid_current
--------------
    128001925
(1 row)

postgres=# select txid_current();
 txid_current
--------------
    128001926
(1 row)

postgres=# select txid_current();
 txid_current
--------------
    128001927
(1 row)

postgres=# vacuum freeze verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 64000000 nonremovable row versions in 346248 out of 346248 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/7.18u sec elapsed 7.19 sec.
INFO:  vacuuming "pg_toast.pg_toast_16682"
INFO:  index "pg_toast_16682_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16682": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 7208.527 ms

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location
--------------------------
 5/CCB2DEA8
(1 row)

postgres=# select pg_size_pretty(pg_xlog_location_diff('5/CCB2DEA8', '5/CCB2DAB8'));
 pg_size_pretty
----------------
 1008 bytes
(1 row)

小结

PostgreSQL 9.6对VM文件的功能进行了扩展,实用2个比特位标记该页的记录数是否是clean的,对所有事务可见;以及标记该页的所有记录数是否都是FREEZE的。

在进行freeze操作时,或者触发autovacuum freeze prevent wrapped object时,会自动跳过标记为frozen的page,从而大幅减少vacuum freeze的IO。

对于数据库中有大量静态数据,并且又有高并发的写事务并行存在时,特别实用。

同时PostgreSQL 9.6还提供了强制vacuum freeze的接口,不跳过任何页。 保留这样的接口,目的是在vm文件损坏时可用,VM文件也可以通过这种方法自动修复。

Count

时间: 2024-11-05 12:22:43

PostgreSQL 9.6 vacuum freeze大幅性能提升 代码浅析的相关文章

PostgreSQL 利用编译器extension 支持int128,提升聚合性能

标签 PostgreSQL , int128 , clang , gcc , icc 背景 PostgreSQL 9.4以及以前的版本,在INT,INT2,INT8的聚合计算中,为了保证数据不会溢出,中间结果使用numeric来存储. numeric是PostgreSQL自己实现的一种数值类型,可以存储非常大的数值(估计是做科学计算的需求),但是牺牲了一定的性能. 为了提高聚合,特别是大数据量的聚合时的性能,社区借用了编译器支持的int128类型,作为数据库int, int2, int8的中间计

ThinkJS 2.1:支持 TypeScript,性能提升 90%

Thinkjs 2.0 是第一个全面支持使用 ES6/7 特性开发的 Node.js 框架,于 2015.10.30 正式发布.使用 ES6 里的*/yield 或者 ES7 里的 async/await 可以很好的解决异步嵌套的问题,借助 Babel 编译,可以稳定运行在 Node.js 各个主流版本中.同时 ThinkJS 提供了自动编译和自动更新的机制,免去了文件修改后重启 Node.js 服务的麻烦. 经过一个多月的紧张开发后,ThinkJS 迎来了 2.1 版本.2.1 版本不管在功能

PostgreSQL 9.6 内核优化 - sort性能增强(batch化quicksort代替replacement selection when work_mem small)

PostgreSQL 9.6 内核优化 - sort性能增强(batch化quicksort代替replacement selection when work_mem small) 作者 digoal 日期 2016-10-08 标签 PostgreSQL , 9.6 , 内核优化 , sort , replacement selection , quciksort 背景 排序是比较常见的业务需求,为了降低排序的CPU开销,通常会使用索引来满足排序的需求. 但是并不是所有的QUERY都能使用索引

修改一行SQL代码 性能提升了100倍

在PostgreSQL中修改了一行不明显的代码,把(ANY(ARRAY[...]) 改成 ANY(VALUES(...))),结果查询时间从20s变为0.2s.最初我们学习使用 EXPLAN ANALYZE来优化代码,到后来,http://www.aliyun.com/zixun/aggregation/14171.html">Postgres社区也成为我们学习提升的一个好帮手,付出总会有回报,我们的性能也因此得到了极大的提升. 事出有因 Datadog是专门为IT.开发团队等提供监控服务

PostgreSQL 大表自动 freeze 优化思路

有没有被突发的IO惊到过,有没有见到过大量的autovacuum for prevent wrap.本文依依解开这些头痛的问题. PostgreSQL 的版本冻结是一个比较蛋疼的事情,为什么要做版本冻结呢?因为PG的版本号是uint32的,是重复使用的,所以每隔大约20亿个事务后,必须要冻结,否则记录会变成未来的,对当前事务"不可见".冻结的事务号是2 src/include/access/transam.h #define InvalidTransactionId ((Transac

阿里云SSD云盘第二轮公测 性能提升20倍

本文讲的是阿里云SSD云盘第二轮公测 性能提升20倍6月9日,阿里云开启了"大杀器"SSD云盘的第二轮公测,其IOPS提升到了20000,是当前云盘性能的20倍.同时,盘内数据全部实时落盘,可靠性9个9.尤其适合中大型关系数据库.核心业务系统以及中大型开发测试环境使用.SSD云盘已在杭州地域公测,公测期至7月15日免费使用. IOPS(Input/Output Per Second)即每秒的输入输出量(或读写次数),是衡量磁盘性能的主要指标之一,一个普通的7200转的家用磁盘的IOPS

英特尔不再注重新一代芯片的性能提升?

你还在热情关注 PC 领域吗?如果是,你希望下一代 PC 处理器提供什么样的性能或能效?说实话,对于那些还在关电脑领域的爱好者,在更强的性能.更低的功耗或更长的使用续航时间这些话题之间,常常进行各种有趣的讨论乃至争论,而且从最初英特尔与 AMD 那个相互竞争的年代就已经开始了. 今天,AMD 显然已经不再节奏上了,所以爱好者讨论最多的是英特尔处理器的发展,并认为代表了未来桌面 PC 芯片的风向标.不过,没有外部竞争的压力,英特尔似乎很难逼自己更快的进步,最明显的标志就在于,其 14 纳米的芯片竟

Intel:处理器性能提升不给力是因为更重效能

虽然Tick-Tock升级周期变长了,但Intel大体上还是保持了每年推出新一代处理器的节奏,但在频繁更换LGA接口及架构之外,大家对Intel处理器每次乏善可陈的性能提升都没什么兴奋了,哪怕是四五年前的SNB处理器现在都可以再战三年.为什么出现这个问题?之前大家调侃说这是因为AMD不给力,Intel没动力升级,但Intel心里其实也有苦说不出,这几年处理器发展明显是重效能超过了重性能,背后则是摩尔定律逐渐失效,不太可能每次都大幅提升性能了.Intel公司是摩尔定律的提出者,也是摩尔定律50年来

阿里云缓存服务降价50% 性能提升三倍

17日消息,刚刚宣布入门级产品半年免费的阿里云,又将其云缓存服务OCS价格全面下调50%,同时将性能提升三倍.这也是继6月6日数据库产品RDS降价88%后,阿里云基础服务又一次大幅降价. 阿里云方面介绍,此次云缓存OCS同时打出了"降价+升级配置+免费"三张牌,进一步降低云计算服务的使用门槛. 降价:此次阿里云对OCS现有各档产品价格全面下调50%. 升配:此次阿里云将各档OCS的峰值吞吐量和峰值QPS在现有基础上提高3倍.举个例子,以往开通1GB缓存容量的用户,其 OCS的峰值吞吐量