PostgreSQL 9.6 引领开源数据库攻克多核并行计算难题

PostgreSQL 9.6 引领开源数据库攻克多核并行计算难题

作者

digoal

日期

2016-10-01

标签

PostgreSQL , 9.6 , 并行计算 , 多核计算


背景

经过多年的酝酿(从支持work process到支持动态fork共享内存,再到内核层面支持并行计算),PostgreSQL 的多核并行计算功能终于在2016年发布的9.6版本中正式上线,为PG的scale up能力再次拔高一个台阶,标志着开源数据库已经攻克了并行计算的难题。

相信有很多小伙伴已经开始测试了。

在32物理核的机器上进行了测试,重计算的场景,性能程线性提升。

目前并行计算支持全表扫描,JOIN,聚合。

一、快速安装PostgreSQL 9.6

为了让大伙能够快速用上9.6,以下是一个简单的安装说明。

OS 准备

# yum -y install coreutils glib2 lrzsz sysstat e4fsprogs xfsprogs ntp readline-devel zlib zlib-devel openssl openssl-devel pam-devel libxml2-devel libxslt-devel python-devel tcl-devel gcc make smartmontools flex bison perl perl-devel perl-ExtUtils* openldap openldap-devel

# vi /etc/sysctl.conf
# add by digoal.zhou
fs.aio-max-nr = 1048576
fs.file-max = 76724600
kernel.core_pattern= /data01/corefiles/core_%e_%u_%t_%s.%p
# /data01/corefiles事先建好,权限777
kernel.sem = 4096 2147483647 2147483646 512000
# 信号量, ipcs -l 或 -u 查看,每16个进程一组,每组信号量需要17个信号量。
kernel.shmall = 107374182
# 所有共享内存段相加大小限制(建议内存的80%)
kernel.shmmax = 274877906944
# 最大单个共享内存段大小(建议为内存一半), >9.2的版本已大幅降低共享内存的使用
kernel.shmmni = 819200
# 一共能生成多少共享内存段,每个PG数据库集群至少2个共享内存段
net.core.netdev_max_backlog = 10000
net.core.rmem_default = 262144
# The default setting of the socket receive buffer in bytes.
net.core.rmem_max = 4194304
# The maximum receive socket buffer size in bytes
net.core.wmem_default = 262144
# The default setting (in bytes) of the socket send buffer.
net.core.wmem_max = 4194304
# The maximum send socket buffer size in bytes.
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_keepalive_intvl = 20
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_mem = 8388608 12582912 16777216
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syncookies = 1
# 开启SYN Cookies。当出现SYN等待队列溢出时,启用cookie来处理,可防范少量的SYN攻击
net.ipv4.tcp_timestamps = 1
# 减少time_wait
net.ipv4.tcp_tw_recycle = 0
# 如果=1则开启TCP连接中TIME-WAIT套接字的快速回收,但是NAT环境可能导致连接失败,建议服务端关闭它
net.ipv4.tcp_tw_reuse = 1
# 开启重用。允许将TIME-WAIT套接字重新用于新的TCP连接
net.ipv4.tcp_max_tw_buckets = 262144
net.ipv4.tcp_rmem = 8192 87380 16777216
net.ipv4.tcp_wmem = 8192 65536 16777216
net.nf_conntrack_max = 1200000
net.netfilter.nf_conntrack_max = 1200000
vm.dirty_background_bytes = 409600000
#  系统脏页到达这个值,系统后台刷脏页调度进程 pdflush(或其他) 自动将(dirty_expire_centisecs/100)秒前的脏页刷到磁盘
vm.dirty_expire_centisecs = 3000
#  比这个值老的脏页,将被刷到磁盘。3000表示30秒。
vm.dirty_ratio = 95
#  如果系统进程刷脏页太慢,使得系统脏页超过内存 95 % 时,则用户进程如果有写磁盘的操作(如fsync, fdatasync等调用),则需要主动把系统脏页刷出。
#  有效防止用户进程刷脏页,在单机多实例,并且使用CGROUP限制单实例IOPS的情况下非常有效。
vm.dirty_writeback_centisecs = 100
#  pdflush(或其他)后台刷脏页进程的唤醒间隔, 100表示1秒。
vm.extra_free_kbytes = 4096000
vm.min_free_kbytes = 2097152
vm.mmap_min_addr = 65536
vm.overcommit_memory = 0
#  在分配内存时,允许少量over malloc, 如果设置为 1, 则认为总是有足够的内存,内存较少的测试环境可以使用 1 .
vm.overcommit_ratio = 90
#  当overcommit_memory = 2 时,用于参与计算允许指派的内存大小。
vm.swappiness = 0
#  关闭交换分区
vm.zone_reclaim_mode = 0
# 禁用 numa, 或者在vmlinux中禁止.
net.ipv4.ip_local_port_range = 40000 65535
# 本地自动分配的TCP, UDP端口号范围
#  vm.nr_hugepages = 102352
#  建议shared buffer设置超过64GB时 使用大页,页大小 /proc/meminfo Hugepagesize

# sysctl -p

# vi /etc/security/limits.conf
* soft    nofile  1024000
* hard    nofile  1024000
* soft    nproc   unlimited
* hard    nproc   unlimited
* soft    core    unlimited
* hard    core    unlimited
* soft    memlock unlimited
* hard    memlock unlimited

# rm -f /etc/security/limits.d/*

安装

$ wget https://ftp.postgresql.org/pub/source/v9.6.0/postgresql-9.6.0.tar.bz2

$ tar -jxvf postgresql-9.6.0.tar.bz2

$ cd postgresql-9.6.0

$ ./configure --prefix=/home/digoal/pgsql9.6.0

$ make world -j 32
$ make install-world -j 32

$ vi ~/.bash_profile
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=5281
export PGDATA=/u02/digoal/pg_root$PGPORT
export LANG=en_US.utf8
export PGHOME=/home/digoal/pgsql9.6.0
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi

$ . ~/.bash_profile 

$ df -h
/dev/mapper/vgdata01-lv03
                      4.0T  1.3T  2.8T  32% /u01
/dev/mapper/vgdata01-lv04
                      7.7T  899G  6.8T  12% /u02

初始化集群

$ initdb -D $PGDATA -E UTF8 --locale=C -U postgres -X /u01/digoal/pg_xlog$PGPORT

配置数据库参数

$ cd $PGDATA

$ vi postgresql.conf
listen_addresses = '0.0.0.0'
port = 5281
max_connections = 800
superuser_reserved_connections = 13
unix_socket_directories = '.'
unix_socket_permissions = 0700
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 128GB
huge_pages = try
maintenance_work_mem = 2GB
dynamic_shared_memory_type = sysv
vacuum_cost_delay = 0
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 10.0
bgwriter_flush_after = 256
max_worker_processes = 128
max_parallel_workers_per_gather = 16
old_snapshot_threshold = 8h
backend_flush_after = 256
synchronous_commit = off
full_page_writes = off
wal_buffers = 128MB
wal_writer_delay = 10ms
wal_writer_flush_after = 4MB
checkpoint_timeout = 55min
max_wal_size = 256GB
checkpoint_flush_after = 1MB
random_page_cost = 1.0
effective_cache_size = 512GB
constraint_exclusion = on
log_destination = 'csvlog'
logging_collector = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_timezone = 'PRC'
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_max_workers = 8
autovacuum_naptime = 10s
autovacuum_vacuum_scale_factor = 0.02
autovacuum_analyze_scale_factor = 0.01
statement_timeout = 0
lock_timeout = 0
idle_in_transaction_session_timeout = 0
gin_fuzzy_search_limit = 0
gin_pending_list_limit = 4MB
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
deadlock_timeout = 1s

$ vi pg_hba.conf
local   all             all                                     trust
host    all             all             127.0.0.1/32            trust
host    all             all             ::1/128                 trust
host all all 0.0.0.0/0 md5

启动数据库

$ pg_ctl start

二、多核并行计算相关参数与用法

1. 控制整个数据库集群同时能开启多少个work process,必须设置。

max_worker_processes = 128              # (change requires restart)

2. 控制一个并行的EXEC NODE最多能开启多少个并行处理单元,同时还需要参考表级参数parallel_workers,或者PG内核内置的算法,根据表的大小计算需要开启多少和并行处理单元。
实际取小的。

max_parallel_workers_per_gather = 16    # taken from max_worker_processes

3. 计算并行处理的成本,如果成本高于非并行,则不会开启并行处理。

#parallel_tuple_cost = 0.1              # same scale as above
#parallel_setup_cost = 1000.0   # same scale as above

4. 小于这个值的表,不会开启并行。

#min_parallel_relation_size = 8MB

5. 告诉优化器,强制开启并行。

#force_parallel_mode = off

6. 表级参数,不通过表的大小计算并行度,而是直接告诉优化器这个表需要开启多少个并行计算单元。

parallel_workers (integer)

This sets the number of workers that should be used to assist a parallel scan of this table.
If not set, the system will determine a value based on the relation size.
The actual number of workers chosen by the planner may be less, for example due to the setting of max_worker_processes.

三、测试场景描述

在标签系统中,通常会有多个属性,每个属性使用一个标签标示,最简单的标签是用0和1来表示,代表true和false。

我们可以把所有的标签转换成比特位,例如系统中一共有200个标签,5000万用户。

那么我们可以通过标签的位运算来圈定特定的人群。

这样就会涉及BIT位的运算。

那么我们来看看PostgreSQL位运算的性能如何?

四、测试1 (数据量大于shared buffer)

创建一张测试表,包含一个比特位字段,后面用于测试。

postgres=# create unlogged table t_bit2 (id bit(200)) with (autovacuum_enabled=off, parallel_workers=128);
CREATE TABLE

并行插入32亿记录

for ((i=1;i<=64;i++)) ; do psql -c "insert into t_bit2 select B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010' from generate_series(1,50000000);" & done

单表32亿,180GB

postgres=# \dt+
                     List of relations
 Schema |  Name  | Type  |  Owner   |  Size  | Description
--------+--------+-------+----------+--------+-------------
 public | t_bit2 | table | postgres | 180 GB |

全表扫描测试

非并行模式

postgres=# set force_parallel_mode =off;
SET
postgres=# set max_parallel_workers_per_gather =0;
SET
postgres=# \timing
Timing is on.

执行计划  

postgres=# explain (verbose,costs) select * from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101011';
  QUERY PLAN
 Seq Scan on public.t_bit2  (cost=0.00..71529415.52 rows=16000001 width=32)
   Output: id
   Filter: (bitand(t_bit2.id, B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010'::"bit") = B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101011'::"bit")
(3 rows)

取测试三轮后的结果,排除CACHE影响。

postgres=# explain (analyze,verbose,costs) select * from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101011';

 Seq Scan on public.t_bit2  (cost=0.00..71529415.52 rows=16000001 width=32) (actual time=0.033..1135403.694 rows=3200000000 loops=1)
   Output: id
   Filter: (bitand(t_bit2.id, B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010'::"bit") = B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101011'::"bit")
 Planning time: 0.576 ms
 Execution time: 1285437.199 ms
(5 rows)

Time: 1285438.195 ms

并行模式

postgres=# set force_parallel_mode =on;
postgres=# set max_parallel_workers_per_gather = 64;

取测试三轮后的结果,排除CACHE影响。

postgres=# explain (analyze,verbose,costs) select * from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101011';

 Gather  (cost=1000.00..26630413.18 rows=16000001 width=32) (actual time=30946.103..30946.103 rows=0 loops=1)
   Output: id
   Workers Planned: 32
   Workers Launched: 32
   ->  Parallel Seq Scan on public.t_bit2  (cost=0.00..25029413.08 rows=500000 width=32) (actual time=30941.191..30941.191 rows=0 loops=33)
         Output: id
         Filter: (bitand(t_bit2.id, B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010'::"bit") = B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101011'::"bit")
         Rows Removed by Filter: 96969697
         Worker 0: actual time=30938.594..30938.594 rows=0 loops=1
         Worker 1: actual time=30939.353..30939.353 rows=0 loops=1
         Worker 2: actual time=30939.419..30939.419 rows=0 loops=1
         Worker 3: actual time=30939.574..30939.574 rows=0 loops=1
         Worker 4: actual time=30939.692..30939.692 rows=0 loops=1
         Worker 5: actual time=30939.825..30939.825 rows=0 loops=1
         Worker 6: actual time=30939.850..30939.850 rows=0 loops=1
         Worker 7: actual time=30940.028..30940.028 rows=0 loops=1
         Worker 8: actual time=30940.287..30940.287 rows=0 loops=1
         Worker 9: actual time=30940.466..30940.466 rows=0 loops=1
         Worker 10: actual time=30940.436..30940.436 rows=0 loops=1
         Worker 11: actual time=30940.649..30940.649 rows=0 loops=1
         Worker 12: actual time=30940.733..30940.733 rows=0 loops=1
         Worker 13: actual time=30940.818..30940.818 rows=0 loops=1
         Worker 14: actual time=30941.083..30941.083 rows=0 loops=1
         Worker 15: actual time=30941.086..30941.086 rows=0 loops=1
         Worker 16: actual time=30940.612..30940.612 rows=0 loops=1
         Worker 17: actual time=30941.342..30941.342 rows=0 loops=1
         Worker 18: actual time=30941.617..30941.617 rows=0 loops=1
         Worker 19: actual time=30941.667..30941.667 rows=0 loops=1
         Worker 20: actual time=30941.730..30941.730 rows=0 loops=1
         Worker 21: actual time=30941.207..30941.207 rows=0 loops=1
         Worker 22: actual time=30942.115..30942.115 rows=0 loops=1
         Worker 23: actual time=30942.049..30942.049 rows=0 loops=1
         Worker 24: actual time=30941.440..30941.440 rows=0 loops=1
         Worker 25: actual time=30942.361..30942.361 rows=0 loops=1
         Worker 26: actual time=30942.562..30942.562 rows=0 loops=1
         Worker 27: actual time=30942.430..30942.430 rows=0 loops=1
         Worker 28: actual time=30942.697..30942.697 rows=0 loops=1
         Worker 29: actual time=30942.577..30942.577 rows=0 loops=1
         Worker 30: actual time=30942.985..30942.985 rows=0 loops=1
         Worker 31: actual time=30942.356..30942.356 rows=0 loops=1
 Planning time: 0.061 ms
 Execution time: 32566.303 ms
(42 rows)

聚合测试

非并行模式

postgres=# set force_parallel_mode =off;
postgres=# set max_parallel_workers_per_gather = 0;

postgres=# select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101011';
 count
-------
     0
(1 row)
Time: 810115.643 ms

并行模式

postgres=# set force_parallel_mode =on;
postgres=# set max_parallel_workers_per_gather = 32;

postgres=# select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101011';
 count
-------
     0
(1 row)
Time: 31805.820 ms

五、测试2 (数据量小于shared buffer)

创建一张测试表,包含一个比特位字段,后面用于测试。

postgres=# create unlogged table t_bit1 (id bit(200)) with (autovacuum_enabled=off, parallel_workers=128);
CREATE TABLE

并行插入10亿记录

for ((i=1;i<=50;i++)) ; do psql -c "insert into t_bit1 select B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010' from generate_series(1,20000000);" & done

单表10亿,56GB

postgres=# \dt+
                     List of relations
 Schema |  Name  | Type  |  Owner   |  Size  | Description
--------+--------+-------+----------+--------+-------------
 public | t_bit1 | table | postgres | 56 GB  |

聚合测试

非并行模式

postgres=# set force_parallel_mode =off;
postgres=# set max_parallel_workers_per_gather = 0;

postgres=# select count(*) from t_bit1 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101011';

Time: 261679.060 ms

并行模式

postgres=# set force_parallel_mode = on;
postgres=# set max_parallel_workers_per_gather = 32;

postgres=# select count(*) from t_bit1 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101011';

Time: 9704.983 ms

hash JOIN测试

1亿 JOIN 5000万

create unlogged table t1(id int, info text) with (autovacuum_enabled=off, parallel_workers=128);
create unlogged table t2(id int, info text) with (autovacuum_enabled=off, parallel_workers=128);
insert into t1 select generate_series(1,100000000);
insert into t2 select generate_series(1,50000000);

非并行模式

postgres=# set force_parallel_mode =off;
postgres=# set max_parallel_workers_per_gather = 0;
postgres=# set enable_merjoin=off;

postgres=# explain verbose select count(*) from t1 join t2 using(id);
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Aggregate  (cost=296050602904.73..296050602904.74 rows=1 width=8)
   Output: count(*)
   ->  Hash Join  (cost=963185.44..276314071764.57 rows=7894612456066 width=0)
         Hash Cond: (t1.id = t2.id)
         ->  Seq Scan on public.t1  (cost=0.00..1004425.06 rows=56194706 width=4)
               Output: t1.id
         ->  Hash  (cost=502212.53..502212.53 rows=28097353 width=4)
               Output: t2.id
               ->  Seq Scan on public.t2  (cost=0.00..502212.53 rows=28097353 width=4)
                     Output: t2.id
(10 rows)

postgres=# select count(*) from t1 join t2 using(id);
  count
----------
 50000000
(1 row)
Time: 60630.148 ms

并行模式

postgres=# set force_parallel_mode = on;
postgres=# set max_parallel_workers_per_gather = 32;

postgres=# explain verbose select count(*) from t1 join t2 using(id);
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=28372817100.45..28372817100.46 rows=1 width=8)
   Output: count(*)
   ->  Gather  (cost=28372817097.16..28372817100.37 rows=32 width=8)
         Output: (PARTIAL count(*))
         Workers Planned: 32
         ->  Partial Aggregate  (cost=28372816097.16..28372816097.17 rows=1 width=8)
               Output: PARTIAL count(*)
               ->  Hash Join  (cost=963185.44..8636284956.99 rows=7894612456066 width=0)
                     Hash Cond: (t1.id = t2.id)
                     ->  Parallel Seq Scan on public.t1  (cost=0.00..460038.85 rows=1756085 width=4)
                           Output: t1.id
                     ->  Hash  (cost=502212.53..502212.53 rows=28097353 width=4)
                           Output: t2.id
                           ->  Seq Scan on public.t2  (cost=0.00..502212.53 rows=28097353 width=4)
                                 Output: t2.id
(15 rows)

select count(*) from t1 join t2 using(id);
 Execution time: 50958.985 ms

postgres=# set max_parallel_workers_per_gather = 4;
select count(*) from t1 join t2 using(id);
Time: 39386.647 ms

建议JOIN不要设置太大的并行度。

六、如何设置并行度以及源码分析

GUC变量
1. 控制整个数据库集群同时能开启多少个work process,必须设置。

max_worker_processes = 128              # (change requires restart)

2. 控制一个并行的EXEC NODE最多能开启多少个并行处理单元,同时还需要参考表级参数parallel_workers,或者PG内核内置的算法,根据表的大小计算需要开启多少和并行处理单元。
实际取小的。

max_parallel_workers_per_gather = 16    # taken from max_worker_processes

如果同时还设置了表的并行度parallel_workers,则最终并行度取min(max_parallel_degree , parallel_degree )

                /*
                 * Use the table parallel_degree, but don't go further than
                 * max_parallel_degree.
                 */
                parallel_degree = Min(rel->rel_parallel_degree, max_parallel_degree);

如果表没有设置并行度parallel_workers ,则根据表的大小 和 parallel_threshold 这个硬编码值决定,计算得出(见函数create_plain_partial_paths)

依旧受到max_parallel_workers_per_gather 参数的限制,不能大于它,取小的,前面已经交代了。

代码如下(release后可能有些许修改)

src/backend/optimizer/util/plancat.c

void
get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
                                  RelOptInfo *rel)
{
...
        /* Retrive the parallel_degree reloption, if set. */
        rel->rel_parallel_degree = RelationGetParallelDegree(relation, -1);
...

src/include/utils/rel.h

/*
 * RelationGetParallelDegree
 *              Returns the relation's parallel_degree.  Note multiple eval of argument!
 */
#define RelationGetParallelDegree(relation, defaultpd) \
        ((relation)->rd_options ? \
         ((StdRdOptions *) (relation)->rd_options)->parallel_degree : (defaultpd))

src/backend/optimizer/path/allpaths.c

/*
 * create_plain_partial_paths
 *        Build partial access paths for parallel scan of a plain relation
 */
static void
create_plain_partial_paths(PlannerInfo *root, RelOptInfo *rel)
{
        int                     parallel_degree = 1;

        /*
         * If the user has set the parallel_degree reloption, we decide what to do
         * based on the value of that option.  Otherwise, we estimate a value.
         */
        if (rel->rel_parallel_degree != -1)
        {
                /*
                 * If parallel_degree = 0 is set for this relation, bail out.  The
                 * user does not want a parallel path for this relation.
                 */
                if (rel->rel_parallel_degree == 0)
                        return;

                /*
                 * Use the table parallel_degree, but don't go further than
                 * max_parallel_degree.
                 */
                parallel_degree = Min(rel->rel_parallel_degree, max_parallel_degree);
        }
        else
        {
                int                     parallel_threshold = 1000;

                /*
                 * If this relation is too small to be worth a parallel scan, just
                 * return without doing anything ... unless it's an inheritance child.
                 * In that case, we want to generate a parallel path here anyway.  It
                 * might not be worthwhile just for this relation, but when combined
                 * with all of its inheritance siblings it may well pay off.
                 */
                if (rel->pages < parallel_threshold &&
                        rel->reloptkind == RELOPT_BASEREL)
                        return;
// 表级并行度没有设置时,通过表的大小和parallel_threshold 计算并行度
                /*
                 * Limit the degree of parallelism logarithmically based on the size
                 * of the relation.  This probably needs to be a good deal more
                 * sophisticated, but we need something here for now.
                 */
                while (rel->pages > parallel_threshold * 3 &&
                           parallel_degree < max_parallel_degree)
                {
                        parallel_degree++;
                        parallel_threshold *= 3;
                        if (parallel_threshold >= PG_INT32_MAX / 3)
                                break;
                }
        }

        /* Add an unordered partial path based on a parallel sequential scan. */
        add_partial_path(rel, create_seqscan_path(root, rel, NULL, parallel_degree));
}

3. 计算并行处理的成本,如果成本高于非并行,则不会开启并行处理。

#parallel_tuple_cost = 0.1              # same scale as above
#parallel_setup_cost = 1000.0   # same scale as above

4. 小于这个值的表,不会开启并行。

#min_parallel_relation_size = 8MB

5. 告诉优化器,强制开启并行。

#force_parallel_mode = off

表级参数
6. 不通过表的大小计算并行度,而是直接告诉优化器这个表需要开启多少个并行计算单元。

parallel_workers (integer)

This sets the number of workers that should be used to assist a parallel scan of this table.
If not set, the system will determine a value based on the relation size.
The actual number of workers chosen by the planner may be less, for example due to the setting of max_worker_processes.

七、参考信息

1. http://www.postgresql.org/docs/9.6/static/sql-createtable.html

parallel_workers (integer)

This sets the number of workers that should be used to assist a parallel scan of this table.
If not set, the system will determine a value based on the relation size.
The actual number of workers chosen by the planner may be less, for example due to the setting of max_worker_processes.

2. http://www.postgresql.org/docs/9.6/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-OTHER

force_parallel_mode (enum)
Allows the use of parallel queries for testing purposes even in cases where no performance benefit is expected.
The allowed values of force_parallel_mode are off (use parallel mode only when it is expected to improve performance),
on (force parallel query for all queries for which it is thought to be safe),
and regress (like on, but with additional behavior changes as explained below).

More specifically, setting this value to on will add a Gather node to the top of any query plan for which this appears to be safe,
so that the query runs inside of a parallel worker. Even when a parallel worker is not available or cannot be used,
operations such as starting a subtransaction that would be prohibited in a parallel query context will be prohibited unless the planner believes that this will cause the query to fail.
If failures or unexpected results occur when this option is set, some functions used by the query may need to be marked PARALLEL UNSAFE (or, possibly, PARALLEL RESTRICTED).

Setting this value to regress has all of the same effects as setting it to on plus some additional effects that are intended to facilitate automated regression testing. Normally,
messages from a parallel worker include a context line indicating that, but a setting of regress suppresses this line so that the output is the same as in non-parallel execution.
Also, the Gather nodes added to plans by this setting are hidden in EXPLAIN output so that the output matches what would be obtained if this setting were turned off.

3. http://www.postgresql.org/docs/9.6/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR

max_worker_processes (integer)
Sets the maximum number of background processes that the system can support. This parameter can only be set at server start. The default is 8.

When running a standby server, you must set this parameter to the same or higher value than on the master server. Otherwise, queries will not be allowed in the standby server.

max_parallel_workers_per_gather (integer)
Sets the maximum number of workers that can be started by a single Gather node. Parallel workers are taken from the pool of processes established by max_worker_processes.
Note that the requested number of workers may not actually be available at run time. If this occurs, the plan will run with fewer workers than expected,
which may be inefficient. Setting this value to 0, which is the default, disables parallel query execution.

Note that parallel queries may consume very substantially more resources than non-parallel queries,
because each worker process is a completely separate process which has roughly the same impact on the system as an additional user session.
This should be taken into account when choosing a value for this setting, as well as when configuring other settings that control resource utilization, such as work_mem.
Resource limits such as work_mem are applied individually to each worker, which means the total utilization may be much higher across all processes than it would normally be for any single process.
For example, a parallel query using 4 workers may use up to 5 times as much CPU time, memory, I/O bandwidth, and so forth as a query which uses no workers at all.

For more information on parallel query, see Chapter 15.

4. http://www.postgresql.org/docs/9.6/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS

parallel_setup_cost (floating point)
Sets the planner's estimate of the cost of launching parallel worker processes. The default is 1000.

parallel_tuple_cost (floating point)
Sets the planner's estimate of the cost of transferring one tuple from a parallel worker process to another process.
The default is 0.1.

min_parallel_relation_size (integer)
Sets the minimum size of relations to be considered for parallel scan. The default is 8 megabytes (8MB).

Count

时间: 2024-11-03 18:00:13

PostgreSQL 9.6 引领开源数据库攻克多核并行计算难题的相关文章

开源数据库PostgreSQL攻克并行计算难题

经过多年的酝酿(从支持work process到支持动态fork共享内存,再到内核层面支持并行计算),PostgreSQL 的并行计算功能终于来了,为PG的scale up能力再次拔高一个台阶,标志着开源数据库已经攻克了并行计算的难题. 相信有很多小伙伴已经开始测试了,我也测试了一个场景是标签系统类应用的比特位运算,昨天测试发现性能相比非并行已经提升了7倍. 调整并行度,在32个核的虚拟机上测试,性能提升了约10多倍.但是实际上并没有到32倍,不考虑内存和IO的瓶颈,是有优化空间.注意不同的并行

为什么PostgreSQL是&quot;最先进的开源数据库&quot;

背景 打开PostgreSQL网站,你会看到这样的flag. The world's most advanced open source database. 那么它从何而来呢? 从个人对数据库的角度,在几个方面来分析一下 一.功能方面 1 数据处理能力 - 高级SQL特性 1.窗口查询 2.聚合查询,支持WITHIN GROUP.grouping sets, rollup, cube等 3.CTE, 递归查询 4.merge,upsert (on conflict)语法 5.继承 6.重写规则

MySQL与PostgreSQL:该选择哪个开源数据库?哪一个更好?

Naresh Kumar是一位软件工程师与热情的博主,对编程与新事物充满了激情和兴趣.近日,Naresh撰写了一篇博文,对开源世界最常见的两种数据库MySQL与PostgreSQL的特点进行了详尽的分析和比对. 如果打算为项目选择一款免费.开源的数据库,那么你可能会在MySQL与PostgreSQL之间犹豫不定.MySQL与PostgreSQL都是免费.开源.强大.且功能丰富的数据库.你主要的问题可能是:哪一个才是最好的开源数据库,MySQL还是PostgreSQL呢?该选择哪一个开源数据库呢?

在 Ubuntu 上安装世界上最先进的开源数据库 PostgreSQL 9.4 和 phpPgAdmin

在 Ubuntu 上安装世界上最先进的开源数据库 PostgreSQL 9.4 和 phpPgAdmin 简介 PostgreSQL 是一款强大的,开源的,对象关系型数据库系统.它支持所有的主流操作系统,包括 Linux.Unix(AIX.BSD.HP-UX,SGI IRIX.Mac OS.Solaris.Tru64) 以及 Windows 操作系统. 下面是 Ubuntu 发起者 Mark Shuttleworth 对 PostgreSQL 的一段评价. PostgreSQL 是一款极赞的数据

新PostgreSQL开源数据库将目标锁定在NoSQL市场

新版PostgreSQL开源数据库内置了被广泛使用的JSON数据交换格式,并将目标锁定为以http://www.aliyun.com/zixun/aggregation/13461.html">MongoDB为代表的非关系型数据存储中的NoSQL市场. PostgreSQL在周四发布了PostgreSQL 9.4首个测试版.该测试版包括有大量针对快速增长的Web应用的新功能.在这些Web应用中,许多都需求快速存储和检索海量用户数据.用户通常会选择一些专门针对工作负载之类的NoSQL数据库.

一篇文章,掌握所有开源数据库的现状

  数据库作为业务的核心,在整个基础软件栈中是非常重要的一环.近几年社区也是新的方案和思想层出不穷,接下来我将总结一下近几年一些主流的开源数据库方案,其背后的设计思想以及适用场景.本人才疏学浅如有遗漏或者错误请见谅.本次分享聚焦于数据库既结构化数据存储 OLTP 及 NoSQL 领域,不会涉及 OLAP.对象存储.分布式文件系统. 1 开源RDBMS与互联网的崛起 很长时间以来,关系型数据库一直是大公司的专利,市场被 Oracle / DB2 等企业数据库牢牢把持.但是随着互联网的崛起.开源社区

开源数据库该怎么玩?

9月13日,北京,巨杉数据库主办的"极客Cool" 沙龙第一期圆满结束.沙龙上,开源数据库的三个重要代表SequoiaDB巨杉数据库CTO王涛(NoSQL),阿里云RDS for PostgreSQL/PPAS 专家 萧少聪(PostgreSQL),PingCAP创始人刘奇(NewSQL),分别作了进行了分享,最后也针对新一代的数据库的应用.发展进行了圆桌讨论. 本次活动也特别感谢开源中国对本次活动的支持. SequoiaDB核心架构与应用场景 王涛的分享,首先介绍了SequoiaDB

盘点2013:21款最优秀的开源数据库

作为一名软件开发人员或DBA,其中一份必不可少的工作就是与数据库打交道,比如MS SQL服务器.MySQL.Oracle.PostgreSQL.MongoDB等等.众所周知,其中MySQL是目前使用最广泛最好的免费开源数据库,此外,还有一些你不知道或者没用过但又非常出色的开源数据库,例如 PostgreSQL. MongoDB.HBase.Cassandra.Couchbase.Neo4j.Riak.Redis.Firebird 等. 在本文,作者总共总结了21个最优秀的开源数据库,不妨让我们一

那些容易被黑产盯上的开源数据库软件

  近期全球范围内大量MongoDB因为配置不当导致公网匿名可访问,被黑产大量利用,删除业务数据并索要赎金后才恢复数据,本文从攻防角度讲述下那些容易被黑产盯上的开源数据库软件. 动机 只要攻击成本低于收益,就有动机.互联网公司大量使用各种开源的数据库保存重要的业务数据,一旦数据被销毁且无法恢复,会造成较大损失,这类互联网公司有相当一部分安全防护薄弱,攻击成本低,也有一定能力支付较小金融的赎金,于是这波人就成了黑产理想的攻击目标. 攻击过程 这 类攻击往往非常简单粗暴但有效,以MongoDB为例.