PostgreSQL 10.0 preview 性能增强 - CLOG group commit

标签

PostgreSQL , 10.0 , CLOG , group commit


背景

clog是PostgreSQL的事务提交状态日志,每个事务对应2个BIT,当事务频繁(小事务)结束时,可能出现CLOGControlLock 冲突的问题。

虽然增加clog buffer可以缓解,PostgreSQL 10.0 采样clog group commit进一步降低这个冲突。

提升高并发下的TPS能力。

I think the main focus for test in this area would be at higher client
count.  At what scale factors have you taken the data and what are
the other non-default settings you have used.  By the way, have you
tried by dropping and recreating the database and restarting the server
after each run, can you share the exact steps you have used to perform
the tests.  I am not sure why it is not showing the benefit in your testing,
may be the benefit is on some what more higher end m/c or it could be
that some of the settings used for test are not same as mine or the way
to test the read-write workload of pgbench is different.  

In anycase, I went ahead and tried further reducing the CLogControlLock
contention by grouping the transaction status updates.  The basic idea
is same as is used to reduce the ProcArrayLock contention [1] which is to
allow one of the proc to become leader and update the transaction status for
other active transactions in system.  This has helped to reduce the
contention
around CLOGControlLock.  Attached patch group_update_clog_v1.patch
implements this idea.  

I have taken performance data with this patch to see the impact at
various scale-factors.  All the data is for cases when data fits in shared
buffers and is taken against commit - 5c90a2ff on server with below
configuration and non-default postgresql.conf settings.  

Performance Data
-----------------------------
RAM - 500GB
8 sockets, 64 cores(Hyperthreaded128 threads total)  

Non-default parameters
------------------------------------
max_connections = 300
shared_buffers=8GB
min_wal_size=10GB
max_wal_size=15GB
checkpoint_timeout    =35min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 256MB  

Refer attached files for performance data.  

sc_300_perf.png - This data indicates that at scale_factor 300, there is a
gain of ~15% at higher client counts, without degradation at lower client
count.
different_sc_perf.png - At various scale factors, there is a gain from
~15% to 41% at higher client counts and in some cases we see gain
of ~5% at somewhat moderate client count (64) as well.
perf_write_clogcontrollock_data_v1.ods - Detailed performance data at
various client counts and scale factors.  

Feel free to ask for more details if the data in attached files is not
clear.  

Below is the LWLock_Stats information with and without patch:  

Stats Data
---------
A. scale_factor = 300; shared_buffers=32GB; client_connections - 128  

HEAD - 5c90a2ff
----------------
CLogControlLock Data
------------------------
PID 94100 lwlock main 11: shacq 678672 exacq 326477 blk 204427 spindelay
8532 dequeue self 93192
PID 94129 lwlock main 11: shacq 757047 exacq 363176 blk 207840 spindelay
8866 dequeue self 96601
PID 94115 lwlock main 11: shacq 721632 exacq 345967 blk 207665 spindelay
8595 dequeue self 96185
PID 94011 lwlock main 11: shacq 501900 exacq 241346 blk 173295 spindelay
7882 dequeue self 78134
PID 94087 lwlock main 11: shacq 653701 exacq 314311 blk 201733 spindelay
8419 dequeue self 92190  

After Patch group_update_clog_v1
----------------
CLogControlLock Data
------------------------
PID 100205 lwlock main 11: shacq 836897 exacq 176007 blk 116328 spindelay
1206 dequeue self 54485
PID 100034 lwlock main 11: shacq 437610 exacq 91419 blk 77523 spindelay 994
dequeue self 35419
PID 100175 lwlock main 11: shacq 748948 exacq 158970 blk 114027 spindelay
1277 dequeue self 53486
PID 100162 lwlock main 11: shacq 717262 exacq 152807 blk 115268 spindelay
1227 dequeue self 51643
PID 100214 lwlock main 11: shacq 856044 exacq 180422 blk 113695 spindelay
1202 dequeue self 54435  

The above data indicates that contention due to CLogControlLock is
reduced by around 50% with this patch.  

The reasons for remaining contention could be:  

1. Readers of clog data (checking transaction status data) can take
Exclusive CLOGControlLock when reading the page from disk, this can
contend with other Readers (shared lockers of CLogControlLock) and with
exclusive locker which updates transaction status. One of the ways to
mitigate this contention is to increase the number of CLOG buffers for which
patch has been already posted on this thread.  

2. Readers of clog data (checking transaction status data) takes shared
CLOGControlLock which can contend with exclusive locker (Group leader) which
updates transaction status.  I have tried to reduce the amount of work done
by group leader, by allowing group leader to just read the Clog page once
for all the transactions in the group which updated the same CLOG page
(idea similar to what we currently we use for updating the status of
transactions
having sub-transaction tree), but that hasn't given any further performance
boost,
so I left it.  

I think we can use some other ways as well to reduce the contention around
CLOGControlLock by doing somewhat major surgery around SLRU like using
buffer pools similar to shared buffers, but this idea gives us moderate
improvement without much impact on exiting mechanism.  

Thoughts?  

[1] -
http://www.postgresql.org/message-id/CAA4eK1JbX4FzPHigNt0JSaz30a85BPJV+ewhk+wg_o-T6xufEA@mail.gmail.com  

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

这个patch的讨论,详见邮件组,本文末尾URL。

PostgreSQL社区的作风非常严谨,一个patch可能在邮件组中讨论几个月甚至几年,根据大家的意见反复的修正,patch合并到master已经非常成熟,所以PostgreSQL的稳定性也是远近闻名的。

参考

https://www.postgresql.org/message-id/flat/CAA4eK1+8=X9mSNeVeHg_NqMsOR-XKsjuqrYzQf=iCsdh3U4EOA@mail.gmail.com#CAA4eK1+8=X9mSNeVeHg_NqMsOR-XKsjuqrYzQf=iCsdh3U4EOA@mail.gmail.com

https://commitfest.postgresql.org/13/358/

时间: 2024-11-02 00:13:42

PostgreSQL 10.0 preview 性能增强 - CLOG group commit的相关文章

震精 - PostgreSQL 10.0 preview 性能增强 - WARM提升一倍性能

标签 PostgreSQL , 10.0 , WARM , 写放大 , 索引写放大 背景 目前,PostgreSQL的MVCC是多版本来实现的,当更新数据时,产生新的版本.(社区正在着手增加基于回滚段的存储引擎) 由于索引存储的是KEY+CTID(行号),当tuple的新版本与旧版本不在同一个数据块(BLOCK)的时候,索引也要随之变化,当新版本在同一个块里面时,则发生HOT UPDATE,索引的值不需要更新,但是因为产生了一条新的记录,所以也需要插入一条索引item,垃圾回收时,将其回收,因此

PostgreSQL 10.0 preview 性能增强 - 推出JIT开发框架(朝着HTAP迈进)

标签 PostgreSQL , 10.0 , HTAP , 动态编译 , JIT , LLVM , 表达式 , 函数跳转 背景 数据库发展了几十年,出现了很多产品,有面向OLTP(在线事务处理)的,有面向OLAP(在线分析)的. 虽然两个场景各有需求特色,但是企业需要为其需求买单,因为目前很少有产品可以同时满足在线处理和在线分析的需求. 比如一家企业,通常都有业务的波峰波谷,比如游戏业务,通常波谷可能是在凌晨,因为大多数人都睡了.而波峰可能出现在每天的工作闲时.游戏运营时段.节假日等. 为了分析

PostgreSQL 10.0 preview 功能增强 - CLOG oldest XID跟踪

标签 PostgreSQL , 10.0 , oldest xid , XID , clog 背景 PostgreSQL tuple中记录的xmin,xmax事务号是uint32类型,所以是一个rotate使用的方式,需要frozen. CLOG存储的是oldest XID之后的XID,也就是说这个XID之后的事务,都保留了事务提交的状态值,之前的CLOG可能被删除或者正在被删除. 这个patch与从CLOG获取事务状态相关,允许提交一个任意的XID值,不管在CLOG是否在TRUNCATE过程中

PostgreSQL 10.0 preview 性能增强 - mergesort(Gather merge)

标签 PostgreSQL , 10.0 , merge sort , gather merge 背景 在数据库中,经常会有多个节点append,然后sort的情况. 例如一张表有10个分区,查询所有分区,并按某列排序输出,常规的做法是所有的记录append,然后sort. PostgreSQL 10.0 将支持append node的并行计算,也就是说所有的分区表可以并行的sort,然后返回,此时就可以使用merge sort来提高排序的速度. 另外,像单表的并行计算,如果需要排序输出的话,每

PostgreSQL 10.0 preview 性能增强 - hash index metapage cache、高并发增强

标签 PostgreSQL , 10.0 , hash index 背景 hash index是PostgreSQL中一个非常老的索引访问方法,也是非常经典的索引. hash index中存储的是索引字段的hash value,而不是原始值,btree索引中存储的是原始值. 因此,当字段非常大时,btree索引可能无法使用. 例如 postgres=# create table test_hash_btree(c1 text); CREATE TABLE postgres=# insert in

PostgreSQL 10.0 preview 性能增强 - 分区表性能增强(plan阶段加速)

标签 PostgreSQL , 10.0 , 分区表 , 子表 , 元信息搜索性能增强 背景 PostgreSQL 10.0 增强了分区表的子表搜索性能,对于涉及分区表包含子表特别多的QUERY,可以提升性能. 性能分析 get_tabstat_entry, find_all_inheritors成为主要瓶颈. Hello. I decided to figure out whether current implementation of declarative partitioning has

PostgreSQL 10.0 preview 性能增强 - pg_xact align(cacheline对齐)

标签 PostgreSQL , 10.0 , cacheline对齐 , pgxact 背景 cacheline对齐,可以大幅提升高并发下的性能. Hackers, originally this idea was proposed by Andres Freund while experimenting with lockfree Pin/UnpinBuffer [1]. The patch is attached as well as results of pgbench -S on 72-

PostgreSQL 10.0 preview 性能增强 - (多维分析)更快,更省内存hashed aggregation with grouping sets

标签 PostgreSQL , 10.0 , hashed aggregation with grouping sets 背景 grouping sets 是多维分析语法,PostgreSQL 从9.5开始支持这种语法,常被用于OLAP系统,数据透视等应用场景. <PostgreSQL 9.5 new feature - Support GROUPING SETS, CUBE and ROLLUP.> 由于多维分析的一个QUERY涉及多个GROUP,所以如果使用hash agg的话,需要多个H

PostgreSQL 10.0 preview 性能增强 - OLAP提速框架, Faster Expression Evaluation Framework(含JIT)

标签 PostgreSQL , 10.0 , llvm , jit , Faster Expression Evaluation Framework 背景 PostgreSQL 10.0有可能会融合JIT,向量计算等技术,提供一个通用的,便于高效协作,提升OLAP性能的一个开发框架. 虽然目前社区有朋友已经提供了LLVM和向量计算的插件,很显然社区是想在内核中直接整合这些计算的.加油PostgreSQL <分析加速引擎黑科技 - LLVM.列存.多核并行.算子复用 大联姻 - 一起来开启Post