PostgreSQL 10.0 preview 性能增强 - libpq支持pipeline batch模式减少网络交互提升性能

背景

PostgreSQL 10.0 libpq支持pipeline batch两种模式，batch模式意味着客户端可以将多个QUERY塞入pipeline，作为一个batch提交给server段，从而减少客户端和服务端的网络交互次数。

在网络环境不太好的环境中，特别是云环境，大幅提升性能。

+   <application>libpq</application> supports queueing up mulitiple queries into
+   a pipeline to be executed as a batch on the server. Batching queries allows
+   applications to avoid a client/server round-trip after each query to get
+   the results before issuing the next query.

详见

Hi all  

Following on from the foreign table batch inserts thread[1], here's a patch
to add support for pipelining queries into asynchronous batches in libpq.  

Attached, and also available at
https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch (subject
to rebasing and force pushes).  

It's cleaned up over the draft I posted on that thread and has error
recovery implemented. I've written and included the SGML docs for it. The
test program is now pretty comprehensive, more so than for anything else in
libpq anyway. I'll submit it to the next CF as a 9.7/10.0 candidate.  

I'm measuring 300x (not %) performance improvements doing batches on
servers over the Internet, so this seems pretty worthwhile. It turned out
to be way less invasive than I expected too.  

(I intentionally didn't add any way for clients to annotate each work-item
in a batch with their own private data. I think that'd be really useful and
would make implementing clients easier, but should be a separate patch).  

This should be very useful for optimising FDWs, Postgres-XC, etc.  

[1]
http://www.postgresql.org/message-id/CAMsr+YFgDUiJ37DEfPRk8WDBuZ58psdAYJd8iNFSaGxtw=wU3g@mail.gmail.com  

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

一些例子

+  <para>
+   An example of batch use may be found in the source distribution in
+   <filename>src/test/examples/libpqbatch.c</filename>.
+  </para>
+
+  <sect2>
+   <title>When to use batching</title>
+
+   <para>
+    Much like asynchronous query mode, there is no performance disadvantage to
+    using batching and pipelining. It somewhat increased client application
+    complexity and extra caution is required to prevent client/server network
+    deadlocks, but can offer considerable performance improvements.
+   </para>
+
+   <para>
+    Batching is most useful when the server is distant, i.e. network latency
+    ("ping time") is high, and when many small operations are being performed in
+    rapid sequence. There is usually less benefit in using batches when each
+    query takes many multiples of the client/server round-trip time to execute.
+    A 100-statement operation run on a server 300ms round-trip-time away would take
+    30 seconds in network latency alone without batching; with batching it may spend
+    as little as 0.3s waiting for results from the server.
+   </para>
+
+   <para>
+    Use batches when your application does lots of small
+    <literal>INSERT</literal>, <literal>UPDATE</literal> and
+    <literal>DELETE</literal> operations that can't easily be transformed into
+    operations on sets or into a
+    <link linkend="libpq-copy"><literal>COPY</literal></link> operation.
+   </para>
+
+   <para>
+    Batching less useful when information from one operation is required by the
+    client before it knows enough to send the next operation. The client must
+    introduce a synchronisation point and wait for a full client/server
+    round-trip to get the results it needs. However, it's often possible to
+    adjust the client design to exchange the required information server-side.
+    Read-modify-write cycles are especially good candidates; for example:
+    <programlisting>
+     BEGIN;
+     SELECT x FROM mytable WHERE id = 42 FOR UPDATE;
+     -- result: x=2
+     -- client adds 1 to x:
+     UPDATE mytable SET x = 3 WHERE id = 42;
+     COMMIT;
+    </programlisting>
+    could be much more efficiently done with:
+    <programlisting>
+     UPDATE mytable SET x = x + 1;
+    </programlisting>
+   </para>
+
+   <note>
+    <para>
+     The batch API was introduced in PostgreSQL 9.6, but clients using it can
+     use batches on server versions 8.4 and newer. Batching works on any server
+     that supports the v3 extended query protocol.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="libpq-batch-using">
+   <title>Using batch mode</title>
+
+   <para>
+    To issue batches the application must switch
+    <application>libpq</application> into batch mode. Enter batch mode with <link
+    linkend="libpq-pqbeginbatchmode"><function>PQbeginBatchMode(conn)</function></link> or test
+    whether batch mode is active with <link
+    linkend="libpq-pqisinbatchmode"><function>PQisInBatchMode(conn)</function></link>. In batch mode only <link
+    linkend="libpq-async">asynchronous operations</link> are permitted, and
+    <literal>COPY</literal> is not allowed. (The restriction on <literal>COPY</literal> is an implementation
+    limit; the PostgreSQL protocol and server can support batched <literal>COPY</literal>).
+   </para>
+
+   <para>
+    The client uses libpq's asynchronous query functions to dispatch work,
+    marking the end of each batch with <function>PQsendEndBatch</function>.
+    Concurrently, it uses <function>PQgetResult</function> and
+    <function>PQgetNextQuery</function> to get results. It may eventually exit
+    batch mode with <function>PQendBatchMode</function> once all results are
+    processed.
+   </para>
+
+   <note>
+    <para>
+     It is best to use batch mode with <application>libpq</application> in
+     <link linkend="libpq-pqsetnonblocking">non-blocking mode</link>. If used in
+     blocking mode it is possible for a client/server deadlock to occur. The
+     client will block trying to send queries to the server, but the server will
+     block trying to send results from queries it's already processed to the
+     client. This only occurs when the client sends enough queries to fill its
+     output buffer and the server's receive buffer before switching to
+     processing input from the server, but it's hard to predict exactly when
+     that'll happen so it's best to always use non-blocking mode.
+    </para>
+   </note>
+
+   <sect3 id="libpq-batch-sending">
+    <title>Issuing queries</title>
+
+    <para>
+     After entering batch mode the application dispatches requests
+     using normal asynchronous <application>libpq</application> functions like
+     <function>PQsendQueryParams</function>, <function>PQsendPrepare</function>,
+     etc. The asynchronous requests are followed by a <link
+     linkend="libpq-pqsendendbatch"><function>PQsendEndBatch(conn)</function></link> call to mark
+     the end of the batch. The client <emphasis>does not</emphasis> need to call
+     <function>PQgetResult</function> immediately after dispatching each
+     operation. <link linkend="libpq-batch-results">Result processing</link>
+     is handled separately.
+    </para>
+
+    <para>
+     Batched operations will be executed by the server in the order the client
+     sends them. The server will send the results in the order the statements
+     executed. The server usually begins executing the batch before all commands
+     in the batch are queued and the end of batch command is sent. If any
+     statement encounters an error the server aborts the current transaction and
+     skips processing the rest of the batch. Query processing resumes after the
+     end of the failed batch.
+    </para>
+
+    <para>
+     It's fine for one operation to depend on the results of a
+     prior one. One query may define a table that the next query in the same
+     batch uses; similarly, an application may create a named prepared statement
+     then execute it with later statements in the same batch.
+    </para>
+
+   </sect3>
+
+   <sect3 id="libpq-batch-results">
+    <title>Processing results</title>
+
+    <para>
+     The client <link linkend="libpq-batch-interleave">interleaves result
+     processing with sending batch queries</link>, or for small batches may
+     process all results after sending the whole batch.
+    </para>
+
+    <para>
+     To get the result of the first batch entry the client must call <link
+     linkend="libpq-pqgetnextquery"><function>PQgetNextQuery</function></link>. It must then call
+     <function>PQgetResult</function> and handle the results until
+     <function>PQgetResult</function> returns null (or would return null if
+     called). The result from the next batch entry may then be retrieved using
+     <function>PQgetNextQuery</function> and the cycle repeated.  The
+     application handles individual statement results as normal.
+    </para>
+
+    <para>
+     <function>PQgetResult</function> behaves the same as for normal asynchronous
+     processing except that it may contain the new <type>PGresult</type> types
+     <literal>PGRES_BATCH_END</literal> and <literal>PGRES_BATCH_ABORTED</literal>.
+     <literal>PGRES_BATCH_END</literal> is reported exactly once for each
+     <function>PQsendEndBatch</function> call at the corresponding point in
+     the result stream and at no other time. <literal>PGRES_BATCH_ABORTED</literal>
+     is emitted during error handling; see <link linkend="libpq-batch-errors">
+     error handling</link>.
+    </para>
+
+    <para>
+     <function>PQisBusy</function>, <function>PQconsumeInput</function>, etc
+     operate as normal when processing batch results.
+    </para>
+
+    <para>
+     <application>libpq</application> does not provide any information to the
+     application about the query currently being processed. The application
+     must keep track of the order in which it sent queries and the expected
+     results. Applications will typically use a state machine or a FIFO queue
+     for this.
+    </para>

这个patch的讨论，详见邮件组，本文末尾URL。

PostgreSQL社区的作风非常严谨，一个patch可能在邮件组中讨论几个月甚至几年，根据大家的意见反复的修正，patch合并到master已经非常成熟，所以PostgreSQL的稳定性也是远近闻名的。

参考

https://commitfest.postgresql.org/13/1024/

https://www.postgresql.org/message-id/attachment/44303/0001-Pipelining-batch-support-for-libpq.patch

https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch

https://www.postgresql.org/message-id/flat/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com#CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com

时间： 2024-10-30 08:59:30

PostgreSQL 10.0 preview 性能增强 - libpq支持pipeline batch模式减少网络交互提升性能的相关文章

PostgreSQL 10.0 preview 功能增强 - libpq支持多主机连接(failover,LB)让数据库HA和应用配合更紧密

标签 PostgreSQL , 10.0 , libpq , jdbc , failover , loadbalance , multi host , target_session_attrs 背景数据库一主多备,这个词在互联网应该不陌生.但是主备切换和应用程序如何配合才能天衣无缝呢?你可能会有这样的疑问. 1. 什么类型的QUERY发给主库,什么类型的QUERY发给备库? 2. 主库和备库发生了角色切换之后,客户端连接如何配合? 业界有一些做法可以回答这两个问题. 1. 通常使用集群软件,使

PostgreSQL 10.0 preview 功能增强 - slave支持WAITLSN 'lsn', time;用于设置安全replay栅栏

标签 PostgreSQL , 10.0 , waitlsn , replay , slave , 流复制 , 等待恢复背景 PostgreSQL 的流复制是非常灵活的,在主库端,可以将事务的级别设置为同步,异步,本地同步,远程同步刷盘,远程同步刷OS缓存,远程数据库apply等多种级别. <PostgreSQL 9.6 同步多副本与 remote_apply事务同步级别> 但是在备库,还没有一个可以设置安全栅栏的手段,比如说,我们要让备库apply到某个位点后,再执行某个QUERY. 1

PostgreSQL 10.0 preview 功能增强 - 增加access method CHECK接口amcheck

标签 PostgreSQL , 10.0 , amcheck , 逻辑一致性检测 , 物理存储检测背景一些高端存储.包括ZFS文件系统,在使用了RAID后,有块检测和异常块的修复功能. 对于数据库来说,数据的可靠性是非常重要的指标,例如: 1. 写进入是什么,读出来就应该是什么. 2. 当操作系统的collate发生变化时,索引的顺序可能与实际的collate顺序不匹配.造成不稳定现象. 3. 数据块partial write,可能导致数据损坏. 4. 内存页异常,使用到某些异常页时,可能带

PostgreSQL 10.0 preview 功能增强 - 逻辑订阅端控制参数解说

标签 PostgreSQL , 10.0 , 逻辑订阅背景 PostgreSQL 逻辑订阅相关文章请参考 <PostgreSQL 10.0 preview 变化 - 逻辑复制pg_hba.conf变化,不再使用replication条目> <PostgreSQL 10.0 preview 功能增强 - 备库支持逻辑订阅,订阅支持主备漂移了> <PostgreSQL 10.0 preview 功能增强 - 逻辑复制支持并行COPY初始化数据> <PostgreSQ

PostgreSQL 10.0 preview 功能增强 - 后台运行(pg_background)

标签 PostgreSQL , 10.0 , 后台运行 , pg_background_launch , pg_background_result , pg_background_detach , pg_background 背景当用户在管理数据库时,如果要在交互式界面跑一些QUERY,但是不知道QUERY要运行多久,担心网络问题或者其他问题导致终端断开,QUERY执行情况不明的话.就需要后台运行这个功能了. 后台运行在LINUX中也很常见,比如 nohup ls -la / >/tmp/re

PostgreSQL 10.0 preview sharding增强 - 支持异步, 多节点并行

标签 PostgreSQL , 10.0 , sharding 增强背景 PostgreSQL 10.0的sharding增强,比如聚合函数的下推算一个. 现在又带来了一个核武器级别的增强,postgres_fdw的异步化调用. 为什么说是核武器级别的呢,比如你有16个sharding节点(对应postgres_fdw的16个库),10.0以前,如果我需要执行一个QUERY涉及到所有的sharding节点时,每个sharding节点是串行执行的然后append结果. 10.0开始,会变成异步化

PostgreSQL 10.0 preview 功能增强 - OLAP增强向量聚集索引(列存储扩展)

标签 PostgreSQL , 10.0 , Vertical Clustered Index (columnar store extension) , 列存储 , 向量聚集索引背景未来数据库OLTP+OLAP逐渐模糊化,需求逐渐融合是一个大的趋势,如果你的数据库只支持OLTP的场景,未来可能会成为业务的绊脚石. 在这方面PostgreSQL每年发布的新版本,都给用户很大的惊喜,OLTP已经具备非常强大的竞争力(性能.功能.稳定性.成熟度.案例.跨行业应用等),而OLAP方面,新增的feat

PostgreSQL 10.0 preview 功能增强 - 逻辑复制支持并行COPY初始化数据

标签 PostgreSQL , 10.0 , 逻辑复制 , 初始数据COPY 背景 PostgreSQL 已支持逻辑复制,同时对逻辑复制增加了一个初始同步的增强功能,支持通过wal receiver协议跑COPY命令(已封装在逻辑复制的内核代码中),支持多表并行. 也就是说,你可以使用PostgreSQL的逻辑复制,快速的(流式.并行)将一个实例迁移到另一个实例. Logical replication support for initial data copy Add functionalit

PostgreSQL 10.0 preview sharding增强 - 支持分布式事务

标签 PostgreSQL , 10.0 , sharding , 分布式事务 , 2pc , 两阶段事务背景作为一个完整的分布式数据库(sharding),没有分布式事务支持是不行的. 什么是分布式事务呢?比如我们把一个数据库比作一个小朋友,每个小朋友都有100块钱,然后A小朋友给了B小朋友20块钱,这样的话应该是A剩余80块,B剩余120块.如果在交易过程中B小朋友不小心把20块弄丢了,那么会怎么样呢? 理论上应该是交易不成功,A和B都回到100块的状态. 不应该出现中间状态. Post