PostgreSQL 异步IO实测

背景

异步IO的目的是充分发挥块设备的吞吐能力，让块设备处于更繁忙的工作状态（一次连续摄取更多的块），而不是等用户进程需要数据时再读取。

如果数据库并发连接（或者活跃会话）足够时，并且块设备处于繁忙状态，那么没有必要开启异步IO，因为开了也没什么用，块设备已经足够的忙了。

目前PostgreSQL的bitmap heap scan支持异步IO，因为bitmap heap scan是按顺序读取堆表的数据块的，对于机械硬盘，bitmap heap scan异步IO效率可以得到充分的发挥。（实际上全表扫描也适合异步IO。）

异步IO的参数effective_io_concurrency，应该如何设置呢？

如果是磁盘阵列，根据表空间所在的块设备进行设置，例如RAID0， RAID10，设置为磁盘个数，而RAID5或者其他RAID，设置为实际的数据盘个数（如，raid5(10)设置为9）。

如果设置为0，表示不使用异步IO。

仅仅当操作系统支持posix时，才能使用异步IO。

effective_io_concurrency (integer)  

Sets the number of concurrent disk I/O operations that PostgreSQL expects can be executed simultaneously.
Raising this value will increase the number of I/O operations that any individual PostgreSQL session attempts to initiate in parallel.
The allowed range is 1 to 1000, or zero to disable issuance of asynchronous I/O requests. Currently, this setting only affects bitmap heap scans.  

For magnetic drives, a good starting point for this setting is the number of separate drives comprising a RAID 0 stripe or RAID 1 mirror being used for the database.
(For RAID 5 the parity drive should not be counted.)
However, if the database is often busy with multiple queries issued in concurrent sessions, lower values may be sufficient to keep the disk array busy.
A value higher than needed to keep the disks busy will only result in extra CPU overhead.
SSDs and other memory-based storage can often process many concurrent requests, so the best value might be in the hundreds.  

Asynchronous I/O depends on an effective posix_fadvise function, which some operating systems lack.
If the function is not present then setting this parameter to anything but zero will result in an error.
On some operating systems (e.g., Solaris), the function is present but does not actually do anything.  

The default is 1 on supported systems, otherwise 0.
This value can be overridden for tables in a particular tablespace by setting the tablespace parameter of the same name (see ALTER TABLESPACE).

那么生产中应该如何设置呢？

我们实际测试一下，如果是SSD，从测试结果看，建议关闭异步IO。（虽然手册中提到如果是内存盘或SSD盘，可以设置更大的值，但是实测效果不理想。）

测试用例

src/test/regress/expected/select_parallel.out

-- test parallel bitmap heap scan.
set enable_seqscan to off;
set enable_indexscan to off;
set enable_hashjoin to off;
set enable_mergejoin to off;
set enable_material to off;
-- test prefetching, if the platform allows it
DO $$
BEGIN
 SET effective_io_concurrency = 50;
EXCEPTION WHEN invalid_parameter_value THEN
END $$;
set work_mem='64kB';  --set small work mem to force lossy pages
explain (costs off)
        select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
                         QUERY PLAN
------------------------------------------------------------
 Aggregate
   ->  Nested Loop
         ->  Seq Scan on tenk2
               Filter: (thousand = 0)
         ->  Gather
               Workers Planned: 4
               ->  Parallel Bitmap Heap Scan on tenk1
                     Recheck Cond: (hundred > 1)
                     ->  Bitmap Index Scan on tenk1_hundred
                           Index Cond: (hundred > 1)
(10 rows)  

select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
 count
-------
 98000
(1 row)  

create table bmscantest (a int, t text);
insert into bmscantest select r, 'fooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo' FROM generate_series(1,100000) r;
create index i_bmtest ON bmscantest(a);
select count(*) from bmscantest where a>1;
 count
-------
 99999
(1 row)  

reset enable_seqscan;
reset enable_indexscan;
reset enable_hashjoin;
reset enable_mergejoin;
reset enable_material;
reset effective_io_concurrency;
reset work_mem;
drop table bmscantest;

测试

postgres=# select pg_backend_pid();
 pg_backend_pid
----------------
          42328
(1 row)  

postgres=# set enable_seqscan=off;
SET
postgres=# set enable_indexscan=off;
SET
postgres=# set work_mem ='64kB';
SET
postgres=# set max_parallel_workers_per_gather =0;
SET  

postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from ptest where id>1000 ;
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=338227.11..338227.12 rows=1 width=8) (actual time=2634.805..2634.806 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=110649
   ->  Bitmap Heap Scan on public.ptest  (cost=104910.47..313230.01 rows=9998843 width=0) (actual time=418.257..1735.802 rows=9999000 loops=1)
         Output: id, info
         Recheck Cond: (ptest.id > 1000)
         Rows Removed by Index Recheck: 40
         Heap Blocks: exact=408 lossy=82918
         Buffers: shared hit=110649
         ->  Bitmap Index Scan on idx_ptest1  (cost=0.00..102410.76 rows=9998843 width=0) (actual time=418.158..418.158 rows=9999000 loops=1)
               Index Cond: (ptest.id > 1000)
               Buffers: shared hit=27323
 Planning time: 0.123 ms
 Execution time: 2634.858 ms
(14 rows)  

postgres=# set effective_io_concurrency =0;
SET
postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from ptest where id>1000 ;
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=338227.11..338227.12 rows=1 width=8) (actual time=2619.868..2619.869 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=110649
   ->  Bitmap Heap Scan on public.ptest  (cost=104910.47..313230.01 rows=9998843 width=0) (actual time=418.974..1721.578 rows=9999000 loops=1)
         Output: id, info
         Recheck Cond: (ptest.id > 1000)
         Rows Removed by Index Recheck: 40
         Heap Blocks: exact=408 lossy=82918
         Buffers: shared hit=110649
         ->  Bitmap Index Scan on idx_ptest1  (cost=0.00..102410.76 rows=9998843 width=0) (actual time=418.873..418.873 rows=9999000 loops=1)
               Index Cond: (ptest.id > 1000)
               Buffers: shared hit=27323
 Planning time: 0.155 ms
 Execution time: 2619.930 ms
(14 rows)  

postgres=# set effective_io_concurrency =1000;
SET
postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from ptest where id>1000 ;
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=338227.11..338227.12 rows=1 width=8) (actual time=2661.584..2661.585 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=110649
   ->  Bitmap Heap Scan on public.ptest  (cost=104910.47..313230.01 rows=9998843 width=0) (actual time=417.591..1761.522 rows=9999000 loops=1)
         Output: id, info
         Recheck Cond: (ptest.id > 1000)
         Rows Removed by Index Recheck: 40
         Heap Blocks: exact=408 lossy=82918
         Buffers: shared hit=110649
         ->  Bitmap Index Scan on idx_ptest1  (cost=0.00..102410.76 rows=9998843 width=0) (actual time=417.473..417.473 rows=9999000 loops=1)
               Index Cond: (ptest.id > 1000)
               Buffers: shared hit=27323
 Planning time: 0.113 ms
 Execution time: 2661.632 ms
(14 rows)  

postgres=# set max_parallel_workers_per_gather =4;
SET
postgres=# set effective_io_concurrency =1000;
SET
postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from ptest where id>1000 ;
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=226740.55..226740.56 rows=1 width=8) (actual time=1017.452..1017.452 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=111309
   ->  Gather  (cost=226740.13..226740.54 rows=4 width=8) (actual time=1017.350..1017.447 rows=5 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 4
         Workers Launched: 4
         Buffers: shared hit=111309
         ->  Partial Aggregate  (cost=225740.13..225740.14 rows=1 width=8) (actual time=1014.227..1014.227 rows=1 loops=5)
               Output: PARTIAL count(*)
               Buffers: shared hit=110649
               Worker 0: actual time=1013.492..1013.492 rows=1 loops=1
                 Buffers: shared hit=16355
               Worker 1: actual time=1013.497..1013.497 rows=1 loops=1
                 Buffers: shared hit=16269
               Worker 2: actual time=1013.510..1013.510 rows=1 loops=1
                 Buffers: shared hit=16333
               Worker 3: actual time=1013.501..1013.501 rows=1 loops=1
                 Buffers: shared hit=16268
               ->  Parallel Bitmap Heap Scan on public.ptest  (cost=104910.47..219490.85 rows=2499711 width=0) (actual time=418.784..840.764 rows=1999800 loops=5)
                     Recheck Cond: (ptest.id > 1000)
                     Rows Removed by Index Recheck: 8
                     Heap Blocks: exact=92 lossy=18009
                     Buffers: shared hit=110649
                     Worker 0: actual time=417.217..844.848 rows=1962600 loops=1
                       Buffers: shared hit=16355
                     Worker 1: actual time=422.223..844.926 rows=1952280 loops=1
                       Buffers: shared hit=16269
                     Worker 2: actual time=417.216..844.028 rows=1959960 loops=1
                       Buffers: shared hit=16333
                     Worker 3: actual time=417.216..840.404 rows=1952080 loops=1
                       Buffers: shared hit=16268
                     ->  Bitmap Index Scan on idx_ptest1  (cost=0.00..102410.76 rows=9998843 width=0) (actual time=419.867..419.867 rows=9999000 loops=1)
                           Index Cond: (ptest.id > 1000)
                           Buffers: shared hit=27323
 Planning time: 0.121 ms
 Execution time: 1027.083 ms
(37 rows)
postgres=# set effective_io_concurrency =1;
SET
postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from ptest where id>1000 ;
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=226740.55..226740.56 rows=1 width=8) (actual time=1006.182..1006.182 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=111309
   ->  Gather  (cost=226740.13..226740.54 rows=4 width=8) (actual time=1006.073..1006.177 rows=5 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 4
         Workers Launched: 4
         Buffers: shared hit=111309
         ->  Partial Aggregate  (cost=225740.13..225740.14 rows=1 width=8) (actual time=1003.124..1003.124 rows=1 loops=5)
               Output: PARTIAL count(*)
               Buffers: shared hit=110649
               Worker 0: actual time=1002.447..1002.447 rows=1 loops=1
                 Buffers: shared hit=16408
               Worker 1: actual time=1002.441..1002.441 rows=1 loops=1
                 Buffers: shared hit=16312
               Worker 2: actual time=1002.446..1002.446 rows=1 loops=1
                 Buffers: shared hit=16093
               Worker 3: actual time=1002.436..1002.436 rows=1 loops=1
                 Buffers: shared hit=16287
               ->  Parallel Bitmap Heap Scan on public.ptest  (cost=104910.47..219490.85 rows=2499711 width=0) (actual time=413.650..828.524 rows=1999800 loops=5)
                     Recheck Cond: (ptest.id > 1000)
                     Rows Removed by Index Recheck: 8
                     Heap Blocks: exact=85 lossy=18141
                     Buffers: shared hit=110649
                     Worker 0: actual time=412.974..833.142 rows=1968960 loops=1
                       Buffers: shared hit=16408
                     Worker 1: actual time=412.994..831.530 rows=1957440 loops=1
                       Buffers: shared hit=16312
                     Worker 2: actual time=412.980..831.353 rows=1931160 loops=1
                       Buffers: shared hit=16093
                     Worker 3: actual time=412.977..828.009 rows=1954440 loops=1
                       Buffers: shared hit=16287
                     ->  Bitmap Index Scan on idx_ptest1  (cost=0.00..102410.76 rows=9998843 width=0) (actual time=416.164..416.164 rows=9999000 loops=1)
                           Index Cond: (ptest.id > 1000)
                           Buffers: shared hit=27323
 Planning time: 0.124 ms
 Execution time: 1015.976 ms
(37 rows)  

postgres=# set effective_io_concurrency =0;
SET
postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from ptest where id>1000 ;
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=226740.55..226740.56 rows=1 width=8) (actual time=943.288..943.288 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=111309
   ->  Gather  (cost=226740.13..226740.54 rows=4 width=8) (actual time=943.170..943.282 rows=5 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 4
         Workers Launched: 4
         Buffers: shared hit=111309
         ->  Partial Aggregate  (cost=225740.13..225740.14 rows=1 width=8) (actual time=940.060..940.060 rows=1 loops=5)
               Output: PARTIAL count(*)
               Buffers: shared hit=110649
               Worker 0: actual time=939.276..939.276 rows=1 loops=1
                 Buffers: shared hit=16286
               Worker 1: actual time=939.387..939.388 rows=1 loops=1
                 Buffers: shared hit=16303
               Worker 2: actual time=939.389..939.390 rows=1 loops=1
                 Buffers: shared hit=16291
               Worker 3: actual time=939.388..939.389 rows=1 loops=1
                 Buffers: shared hit=15943
               ->  Parallel Bitmap Heap Scan on public.ptest  (cost=104910.47..219490.85 rows=2499711 width=0) (actual time=419.160..766.586 rows=1999800 loops=5)
                     Recheck Cond: (ptest.id > 1000)
                     Rows Removed by Index Recheck: 8
                     Heap Blocks: exact=92 lossy=18411
                     Buffers: shared hit=110649
                     Worker 0: actual time=418.404..771.100 rows=1954320 loops=1
                       Buffers: shared hit=16286
                     Worker 1: actual time=418.487..770.127 rows=1956360 loops=1
                       Buffers: shared hit=16303
                     Worker 2: actual time=418.491..770.200 rows=1954920 loops=1
                       Buffers: shared hit=16291
                     Worker 3: actual time=418.513..768.309 rows=1913160 loops=1
                       Buffers: shared hit=15943
                     ->  Bitmap Index Scan on idx_ptest1  (cost=0.00..102410.76 rows=9998843 width=0) (actual time=421.750..421.750 rows=9999000 loops=1)
                           Index Cond: (ptest.id > 1000)
                           Buffers: shared hit=27323
 Planning time: 0.169 ms
 Execution time: 952.028 ms
(37 rows)

分别使用perf record -ag -p $pid 进行了跟踪。因为这个版本编译时没有打开几个需要的开关(-g -ggdb -fno-omit-frame-pointer)，所以看不出什么问题。用户可以自己测试并分析一下。

参考

《PostgreSQL on Linux 最佳部署手册》

《Linux 性能诊断 perf使用指南》

《PostgreSQL 源码性能诊断(perf profiling)指南》

https://www.postgresql.org/docs/devel/static/runtime-config-resource.html

src/backend/executor/nodeBitmapHeapscan.c

时间： 2024-10-03 18:09:50

PostgreSQL 异步IO实测的相关文章

linux异步IO浅析

知道异步IO已经很久了,但是直到最近,才真正用它来解决一下实际问题(在一个CPU密集型的应用中,有一些需要处理的数据可能放在磁盘上.预先知道这些数据的位置,所以预先发起异步IO读请求.等到真正需要用到这些数据的时候,再等待异步IO完成.使用了异步IO,在发起IO请求到实际使用数据这段时间内,程序还可以继续做其他事情). 假此机会,也顺便研究了一下linux下的异步IO的实现. linux下主要有两套异步IO,一套是由glibc实现的(以下称之为glibc版本).一套是由linux内核实现,并由l

如何在Hp的主机打开异步IO

异步一.说说异步IO(Aio) 在Hp-ux中,当采用同步IO的方式时,这就意味着在下一次写IO进行之前,前一次IO必须返回"成功写完成"的信息,可以看到同步IO方式在IO负荷比较大的系统存在着性能的瓶颈.而采用异步IO方式时,在写请求发出之后操作系统会立即反回给应用一个写成功的信息,余下的事就由操作系统完成了.在Hp-ux中使用Oracle.Sybase这些数据库时,为减少IO的瓶颈.提高库的性能,建议打开异步IO,Hp在这方面相对Aix来说相对麻烦一些. 二.Hp-ux方面打

Python通过select实现异步IO的方法

本文实例讲述了Python通过select实现异步IO的方法.分享给大家供大家参考.具体如下: 在Python中使用select与poll比起在C中使用简单得多.select函数的参数是3个列表,包含整数文件描述符,或者带有可返回文件描述符的fileno()方法对象.第一个参数是需要等待输入的对象,第二个指定等待输出的对象,第三个参数指定异常情况的对象.第四个参数则为设置超时时间,是一个浮点数.指定以秒为单位的超时值.select函数将会返回一组文件描述符,包括输入,输出以及异常. 在lin

Python通过poll实现异步IO的方法

本文实例讲述了Python通过poll实现异步IO的方法.分享给大家供大家参考.具体分析如下: 在使用poll()后返回轮询对象,该对象支持以下方法: pollObj.register(fd,[,eventmask])第一个参数是注册新的文件描述符fd,fd要么是一个整数文件描述符,要么可以带有一个获取文件描述符的fileno()方法的对象.eventmask是一些按位或标记,这些标记指示要处理的事件. POLLIN: 用于读取数据 POLLPRI: 用于读取紧急数据 POLLOUT: 准备

.NET 4.5 异步IO 相关实例_实用技巧

在C/S架构中,不管是传统的winform还是wpf都可能会遇到进行异步操作文件的时候.文件小还好说,直接写操作代码. 如果是大文件很多时候做成异步的操作.在界面上显示一个进度条什么的,后台使用一个backgroundworker来做.在这里告诉大家在.NET Framework4.5中支持异步IO的操作.大大简化之前些的异步方法代码. 使用backgroundworker代码复制代码代码如下: View Code private void Button_Click_3(object sen

浅尝异步IO

关于异步IO 记得几年前使用MFC编程的时候,曾经使用过windows的异步socket. 当在socket句柄上设置好关心的事件(如,可读.可写)后,如果事件发生,则指定的窗口会收到一个指定的消息. int WSAAsyncSelect(SOCKET s, HWND hWnd, unsigned int wMsg, long lEvent); 然后窗口例程取得消息,对socket进行处理(如,recv.send). linux也支持类似的异步IO(不局限于socket),如果事件发生,指定的进

PostgreSQL 异步消息实践 - 亿级/分钟 FEED系统实时监测

标签 PostgreSQL , 异步消息 , 触发器 , 规则 , insert on conflict , 实时分析背景在很多业务系统中,为了定位问题.运营需要.分析需要或者其他需求,会在业务中设置埋点,记录用户的行为在业务系统中产生的日志,也叫FEED日志. 比如订单系统.在业务系统中环环相扣,从购物车.下单.付款.发货,收货(还有纠纷.退款等等),一笔订单通常会产生若干相关联的记录. 每个环节产生的属性可能是不一样的,有可能有新的属性产生,也有可能变更已有的属性值. 为了便于分析,通常

使用信号机制捕捉异步IO时，出现个别IO信号丢失

问题描述使用信号机制捕捉异步IO时,出现个别IO信号丢失首先我简单描述一下问题: 我编写了一个程序,来完成文件的拷贝功能. 首先从目标文件读取数据块,然后使用异步写操作写入另外一个文件中. 为了可以捕获每一次写操作的完结时间,我使用了信号机制.当异步写操作结束时,调用HandleIOResult函数进行处理. 现在的问题是:在捕获信号的过程中,会出现捕获的信号个数少于发送的信号个数. 而当我在每一次异步写之前添加usleep以后,则不会出现这种情况,即所有的信号都可以正确的捕获到. 求大神给

并发-HDFS能支持异步IO写来利用分布式的好处吗？

问题描述 HDFS能支持异步IO写来利用分布式的好处吗? 一个程序,可能同时打开几千甚至上万个文件写到HDFS文件系统里面,如果顺序来写的话,完全无法得到分布式文件系统的好处.但我没看到HDFS API里有异步写的接口. 当然可能用线程来实现并发,但同时开几千个线程,恐怕操作系统受不了吧. 解决方案分布式Spark可以直接对HDFS进行数据的读写,同样支持Spark on...和Hive的兼容性解决方案二: (1)你的计算机上此刻运行着超过1000个线程,这很正常 (2)基于存储系统的物理构