元旦技术大礼包 - 2017金秋将要发布的PostgreSQL 10.0已装备了哪些核武器?

标签

PostgreSQL , 10.0 , 金秋 , 元旦 , 大礼包 , commitfest


背景

早上送给大家的新年大礼包,一年一个大版本是PostgreSQL社区的传统,虽然发布时间通常为秋天,还有一段时间,但是已经迫不及待地想看看2017金秋将要发布的10.0版本已经装备了哪些核武器。

放心,还会有一波又一波的feature和增强搭上开往2017金秋的列车,本文提到的可能只是其中的某一节车厢沃,PGer是不是开始有一点振奋人心的感觉啦。

1. 并行计算专属动态共享内存区,(加速索引扫描\外部表并行的支持步伐)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e13029a5ce353574516c64fd1ec9c50201e705fd

Provide a DSA area for all parallel queries.  

This will allow future parallel query code to dynamically allocate
storage shared by all participants.  

Thomas Munro, with assorted changes by me.

Parallel tuplesort (for parallel B-Tree index creation)

https://commitfest.postgresql.org/12/690/

Parallel bitmap heap scan

https://commitfest.postgresql.org/12/812/

Parallel Index Scans

https://commitfest.postgresql.org/12/849/

Parallel Merge Join

https://commitfest.postgresql.org/12/918/

Parallel Append implementation

https://commitfest.postgresql.org/12/929/

parallelize queries containing subplans

https://commitfest.postgresql.org/12/941/

2. 多副本新增 "任意节点、顺序" 两种模式自由选择

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3901fd70cc7ccacef1b0549a6835bb7d8dcaae43

Support quorum-based synchronous replication.  

This feature is also known as "quorum commit" especially in discussion
on pgsql-hackers.  

This commit adds the following new syntaxes into synchronous_standby_names
GUC. By using FIRST and ANY keywords, users can specify the method to
choose synchronous standbys from the listed servers.  

  FIRST num_sync (standby_name [, ...])
  ANY num_sync (standby_name [, ...])  

The keyword FIRST specifies a priority-based synchronous replication
which was available also in 9.6 or before. This method makes transaction
commits wait until their WAL records are replicated to num_sync
synchronous standbys chosen based on their priorities.  

The keyword ANY specifies a quorum-based synchronous replication
and makes transaction commits wait until their WAL records are
replicated to *at least* num_sync listed standbys. In this method,
the values of sync_state.pg_stat_replication for the listed standbys
are reported as "quorum". The priority is still assigned to each standby,
but not used in this method.  

The existing syntaxes having neither FIRST nor ANY keyword are still
supported. They are the same as new syntax with FIRST keyword, i.e.,
a priorirty-based synchronous replication.  

Author: Masahiko Sawada
Reviewed-By: Michael Paquier, Amit Kapila and me
Discussion: <CAD21AoAACi9NeC_ecm+Vahm+MMA6nYh=Kqs3KB3np+MBOS_gZg@mail.gmail.com>  

Many thanks to the various individuals who were involved in
discussing and developing this feature.

3. 添加会话级临时replication slots支持

(原来slot是为长时间使用设计,如STANDBY。 对于一些测试、针对性复制场景,不再需要担心忘记删除slot带来的问题了)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a924c327e2793d2025b19e18de7917110dc8afd8

Add support for temporary replication slots  

This allows creating temporary replication slots that are removed
automatically at the end of the session or on error.  

From: Petr Jelinek <petr.jelinek@2ndquadrant.com>

4. 认证协议会更加安全(SCRAM)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e7f051b8f9a6341f6d3bf80b29c1dbc1837be9ab

Refactor the code for verifying user's password.  

Split md5_crypt_verify() into three functions:
* get_role_password() to fetch user's password from pg_authid, and check
  its expiration.
* md5_crypt_verify() to check an MD5 authentication challenge
* plain_crypt_verify() to check a plaintext password.  

get_role_password() will be needed as a separate function by the upcoming
SCRAM authentication patch set. Most of the remaining functionality in
md5_crypt_verify() was different for MD5 and plaintext authentication, so
split that for readability.  

While we're at it, simplify the *_crypt_verify functions by using
stack-allocated buffers to hold the temporary MD5 hashes, instead of
pallocing.  

Reviewed by Michael Paquier.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=babe05bc2b781eb3eb84a18d7010d08277e2e399

Turn password_encryption GUC into an enum.  

This makes the parameter easier to extend, to support other password-based
authentication protocols than MD5. (SCRAM is being worked on.)  

The GUC still accepts on/off as aliases for "md5" and "plain", although
we may want to remove those once we actually add support for another
password hash type.  

Michael Paquier, reviewed by David Steele, with some further edits by me.  

Discussion: <CAB7nPqSMXU35g=W9X74HVeQp0uvgJxvYOuA4A-A3M+0wfEBv-w@mail.gmail.com>

https://postgrespro.com/docs/postgresproee/9.6/auth-methods.html#auth-password

21.3.2. Password Authentication  

The password-based authentication methods are md5, scram, and password. These methods operate similarly except for the way that the password is sent across the connection, namely MD5-hashed, SCRAM-SHA-256 and clear-text respectively.  

If you are at all concerned about password “sniffing” attacks then md5 is preferred. Plain password should always be avoided if possible. However, md5 cannot be used with the db_user_namespace feature. If the connection is protected by SSL encryption then password can be used safely (though SSL certificate authentication might be a better choice if one is depending on using SSL).  

scram has more advantages than md5 as it protects from cases where the hashed password is taken directly from pg_authid in which case a connection using only the stolen hash is possible without knowing the password behind it. It protects as well from password interception and data sniffing where the password data could be directly obtained from the network as well as man-in-the-middle (MITM) attacks. So it is strongly encouraged to use it over md5 for password-based deployments.  

Postgres Pro Enterprise database passwords are separate from operating system user passwords. The password for each database user is stored in the pg_authid system catalog. Passwords can be managed with the SQL commands CREATE USER and ALTER ROLE, e.g., CREATE USER foo WITH PASSWORD 'secret'. If no password has been set up for a user, the stored password is null and password authentication will always fail for that user.

5. Support for SCRAM-SHA-256

https://commitfest.postgresql.org/12/829/

6. 分区、多级分区支持

《PostgreSQL 10.0 内置分区表》

此前,我们可以使用pg_pathman插件来实现高性能多级分区

《PostgreSQL 9.5+ 高效分区表实现 - pg_pathman》

《PostgreSQL 9.6 sharding based on FDW & pg_pathman》

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f0e44751d7175fa3394da2c8f85e3ceb3cdbfe63

Implement table partitioning.  

Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own.  The children are called
partitions and contain all of the actual data.  Each partition has an
implicit partitioning constraint.  Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed.  Partitions
can't have extra columns and may not allow nulls unless the parent
does.  Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.  

Currently, tables can be range-partitioned or list-partitioned.  List
partitioning is limited to a single column, but range partitioning can
involve multiple columns.  A partitioning "column" can be an
expression.  

Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations.  The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.  

Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others.  Minor revisions by me.  

Discussion: https://www.postgresql.org/message-id/3029e460-d47c-710e-507e-d8ba759d7cbb@iki.fi

7. 聚合运算减少context切换

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=8ed3f11bb045ad7a3607690be668dbd5b3cc31d7

Perform one only projection to compute agg arguments.  

Previously we did a ExecProject() for each individual aggregate
argument. That turned out to be a performance bottleneck in queries with
multiple aggregates.  

Doing all the argument computations in one ExecProject() is quite a bit
cheaper because ExecProject's fastpath can do the work at once in a
relatively tight loop, and because it can get all the required columns
with a single slot_getsomeattr and save some other redundant setup
costs.  

Author: Andres Freund
Reviewed-By: Heikki Linnakangas
Discussion: https://postgr.es/m/20161103110721.h5i5t5saxfk5eeik@alap3.anarazel.de

8. hash index增强

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6d46f4783efe457f74816a75173eb23ed8930020

Improve hash index bucket split behavior.  

Previously, the right to split a bucket was represented by a
heavyweight lock on the page number of the primary bucket page.
Unfortunately, this meant that every scan needed to take a heavyweight
lock on that bucket also, which was bad for concurrency.  Instead, use
a cleanup lock on the primary bucket page to indicate the right to
begin a split, so that scans only need to retain a pin on that page,
which is they would have to acquire anyway, and which is also much
cheaper.  

In addition to reducing the locking cost, this also avoids locking out
scans and inserts for the entire lifetime of the split: while the new
bucket is being populated with copies of the appropriate tuples from
the old bucket, scans and inserts can happen in parallel.  There are
minor concurrency improvements for vacuum operations as well, though
the situation there is still far from ideal.  

This patch also removes the unworldly assumption that a split will
never be interrupted.  With the new code, a split is done in a series
of small steps and the system can pick up where it left off if it is
interrupted prior to completion.  While this patch does not itself add
write-ahead logging for hash indexes, it is clearly a necessary first
step, since one of the things that could interrupt a split is the
removal of electrical power from the machine performing it.  

Amit Kapila.  I wrote the original design on which this patch is
based, and did a good bit of work on the comments and README through
multiple rounds of review, but all of the code is Amit's.  Also
reviewed by Jesper Pedersen, Jeff Janes, and others.  

Discussion: http://postgr.es/m/CAA4eK1LfzcZYxLoXS874Ad0+S-ZM60U9bwcyiUZx9mHZ-KCWhw@mail.gmail.com

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f0e72a25b05d4c29d0102fa0b892782ff193a00e

Improve handling of dead tuples in hash indexes.  

When squeezing a bucket during vacuum, it's not necessary to retain
any tuples already marked as dead, so ignore them when deciding which
tuples must be moved in order to empty a bucket page.  Similarly, when
splitting a bucket, relocating dead tuples to the new bucket is a
waste of effort; instead, just ignore them.  

Amit Kapila, reviewed by me.  Testing help provided by Ashutosh
Sharma.

9. 支持进程级条件变量,简化sleep\wakeup设计

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e8ac886c24776295dd9b025386a821061da8e4d1

Support condition variables.  

Condition variables provide a flexible way to sleep until a
cooperating process causes an arbitrary condition to become true.  In
simple cases, this can be accomplished with a WaitLatch/ResetLatch
loop; the cooperating process can call SetLatch after performing work
that might cause the condition to be satisfied, and the waiting
process can recheck the condition each time.  However, if the process
performing the work doesn't have an easy way to identify which
processes might be waiting, this doesn't work, because it can't
identify which latches to set.  Condition variables solve that problem
by internally maintaining a list of waiters; a process that may have
caused some waiter's condition to be satisfied must "signal" or
"broadcast" on the condition variable.  

Robert Haas and Thomas Munro

10. 支持聚合运算下推至sharding节点(postgres_fdw增强)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7012b132d07c2b4ea15b0b3cb1ea9f3278801d98

postgres_fdw: Push down aggregates to remote servers.  

Now that the upper planner uses paths, and now that we have proper hooks
to inject paths into the upper planning process, it's possible for
foreign data wrappers to arrange to push aggregates to the remote side
instead of fetching all of the rows and aggregating them locally.  This
figures to be a massive win for performance, so teach postgres_fdw to
do it.  

Jeevan Chalke and Ashutosh Bapat.  Reviewed by Ashutosh Bapat with
additional testing by Prabhat Sahu.  Various mostly cosmetic changes
by me.

Push down more full joins in postgres_fdw

https://commitfest.postgresql.org/12/727/

11. 支持流式备份时,同时备份数据文件与REDO文件

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=56c7d8d4552180fd66fe48423bb2a9bb767c2d87

Allow pg_basebackup to stream transaction log in tar mode  

This will write the received transaction log into a file called
pg_wal.tar(.gz) next to the other tarfiles instead of writing it to
base.tar. When using fetch mode, the transaction log is still written to
base.tar like before, and when used against a pre-10 server, the file
is named pg_xlog.tar.  

To do this, implement a new concept of a "walmethod", which is
responsible for writing the WAL. Two implementations exist, one that
writes to a plain directory (which is also used by pg_receivexlog) and
one that writes to a tar file with optional compression.  

Reviewed by Michael Paquier

12. 分布式事务

https://commitfest.postgresql.org/12/853/

13. Twophase transactions on slave, take 2

https://commitfest.postgresql.org/12/915/

14. Scan key push down to heap

https://commitfest.postgresql.org/12/850/

15. 间接索引

indirect indexes

https://commitfest.postgresql.org/12/874/

16. Logical Replication

https://commitfest.postgresql.org/12/836/

实际上9.4开始就已经支持了,通过外部插件+逻辑流复制来实现,但是没有整合到内核中

值得期待的10.0,一起等风来 - 2017金秋

参考

1. https://commitfest.postgresql.org/

2. http://git.postgresql.org/

3. https://git.postgresql.org/gitweb/?p=postgresql.git;a=summary

时间: 2024-10-31 00:32:10

元旦技术大礼包 - 2017金秋将要发布的PostgreSQL 10.0已装备了哪些核武器?的相关文章

支付宝(AR虚拟现实)红包 玩法与技术背景 - GIS(LBS)、图像识别与秒杀技术的完美结合

背景 作为搞IT的小伙伴们,对虚拟现实(AR)应该并不陌生,这次支付宝带来了一个很有趣的功能,虚拟现实与藏红包结合起来. 简单介绍一下这个业务,然后我们再来思考背后的技术. 支付宝(AR虚拟现实)藏红包.找红包 玩法介绍 想象一下,你可以把你的红包藏在世界各地的任意角落,让全世界的人来找红包,是不是一件很有趣的事情. 比如你可以把红包藏在你家的小狗身上,又或者你可以把它藏在某个建筑物.某个房间.某一本书中...... 藏的方法很简单,首先需要打开支付宝,然后选择红包->AR红包 然后你可以选择藏

数据库选型十八摸 之 PostgreSQL - 致 架构师、开发者

标签 PostgreSQL , 数据库特性 , 数据库应用场景分析 , 数据库选型 背景 数据库对于一家企业来说,相比其他基础组件占据比较核心的位置. 有很多企业由于最初数据库选型问题,导致一错再错,甚至还有为此付出沉痛代价的. 数据库的选型一定要慎重,但是这么多数据库,该如何选择呢? 我前段时间写过一篇关于数据库选型的文章,可以参考如下 <数据库选型思考> 另外,PostgreSQL这个数据库这些年的发展非常的迅猛,虽然国内还跟不上国外的节奏,但是相信国人逐渐会融合进去. 所以我专门针对Po

ApsaraDB的左右互搏术 - 解决企业痛处 TP+AP混合需求 - 无须再唱《爱你痛到不知痛》

标签 PostgreSQL , HybridDB , HTAP , OLTP , OLAP , 混合场景 , Oracle , 企业痛点 , 数据库痛点 背景 随着IT行业在更多的传统行业渗透,我们正逐步的在进入DT时代,让数据发挥价值是企业的真正需求,否则就是一堆废的并且还持续消耗企业人力,财力的数据. 传统企业可能并不像互联网企业一样,有大量的开发人员.有大量的技术储备,通常还是以购买IT软件,或者以外包的形式在存在. 数据的核心 - 数据库,很多传统的行业还在使用传统的数据库. 但是随着I

想挑战AlphaGO吗?先和PostgreSQL玩一玩?? PostgreSQL与人工智能(AI)

标签 PostgreSQL , AI , 人工智能 , 神经网络 , 机器学习 , MADlib , Greenplum , 物联网 , IoT , pivotalR , R 背景 想必很多朋友这几天被铺天盖地的AI战胜60位围棋高手的新闻,我们不得不承认,人工智能真的越来越强了. http://wallstreetcn.com/node/283152 1月4日晚,随着古力认输,Master对人类顶尖高手的战绩停留在60胜0负1和,而令人尴尬的是这唯一一场和棋还是因为棋手掉线系统自动判和,并不是

PostgreSQL 10 新特性, 流式接收端在线压缩redo

标签 PostgreSQL , redo 在线压缩 , wal 在线压缩 , 接收端压缩 , pg_receivexlog , pg_basebackup , 断点续传 背景 虽然现在磁盘已经很廉价,大多数时候不需要压缩了. 但是在一些嵌入式系统,或者一些未扩容的产品环境中,压缩还是非常有必要的. 特别是数据变化大.或者数据增量大的场景,例如物联网(IoT),每天单库可能新增TB级的增量. 比如 <PostgreSQL 如何潇洒的处理每天上百TB的数据增量> 指的就是IoT场景. 这样的场景,

RSA 2017 花甲科技发布端安全“全能战士”S-TEE

本文讲的是RSA 2017 花甲科技发布端安全"全能战士"S-TEE,2017年2月14日,北京花甲科技在RSA2017首次海外发布了下一代端安全产品S-TEE,该产品颠覆了传统移动安全技术,重新定义了包括移动端和物联网的安全,可以称之为端安全的全能战士. 这是花甲科技继去年在美国RSA上发布公司第一代安全产品Securitystack之后,又一次在全球瞩目的安全盛会上发布自己的新产品.这表明了这家企业对技术的不断追求和对自家产品的充分自信. S-TEE是为应用端的核心逻辑和数据提供灵

【硅谷连线】苹果获批“类光场相机”技术专利 友达光电发布新4K

中云网每天连线硅谷,呈现最新鲜资讯!这里的"硅谷"指的是国外具有典型性和创新性企业代表. 1. 创始人承认Snapchat创意来自大学同学 <http://tech.sina.com.cn/i/2013-11-27/08038952811.shtml> "阅后即焚"照片分享服务Snapchat创始人埃文·斯皮格尔(Evan Spiegel)在庭审中承认,其斯坦福大学同学弗兰克·雷金纳德·布朗(Frank Reginald Brown)在Snapchat的

红帽全球客户技术展望2017报告出炉 数字化转型、云和DevOps成关注重点

近日,红帽公司针对红帽客户在2017年包括主要举措.云部署战略及创新技术等在内的计划进行了调查.其中,调查对象分别来自13个国家.6大行业中的268家大型企业.<财富>和全球500强企业以及小型企业. 去年的调研指出了需要持续关注的几大战略领域(包括私有云.DevOps.大数据.移动和物联网),并且强调了业界对于利用IT以更少投入获得更高产出的迫切需求.红帽全球客户技术展望2017基于这些调研结果而编制,并且着重指出受访者对于应对或者创造自身颠覆的理念变化--不仅采用数字化能力创建新的业务模式

英特尔澄清:第一款10nm产品2017年定发布

几天之前我们曾报道称,英特尔的 10 纳米工艺可能将再一次跳票至 2018 年.因为英特尔官网一份有关上月 21 日"资本分析(Capital Analyst)"会议的内容显示,位于以色列迦特镇的 Fab 28 晶圆厂将在"大约两年内投产 10纳米",并且该预案经常将作为量产最新 10 纳米工艺的龙头工厂.因此不免让人认为,英特尔 10 纳米工艺还可以等至 2018 年 1月份才开始量产. 有意思的是,在有关"英特尔 10 纳米量产推迟的至 2018年&q