官人要杯咖啡吗? - PostgreSQL实时监测PLAN tree的执行进度

标签

PostgreSQL , long query , plan tree , SQL进度条


背景

当我们在数据库中执行一些比较大的查询,或者执行比较复杂的函数时,如果要知道执行到哪里了,预计还要多久。

怎么办呢? 有方法得到吗?当然有。

目前PostgreSQL支持的是传统的SQL执行方法,将(nonutility)语句parser后,根据parser tree生成plan tree, 然后根据plan tree去执行里面的每一个NODE。

也就是说,我们实际上是可以随时查看plan tree,以及每个NODE的执行情况的(包括一些LOOP,例如 NEST LOOP JOIN).

比如

postgres=# explain select * from a t1 join a t2 on (t1.id=t2.id) where t2.id between 1 and 10;
                                QUERY PLAN
---------------------------------------------------------------------------
 Nested Loop  (cost=0.55..17.50 rows=10 width=74)
   ->  Index Scan using a_pkey on a t2  (cost=0.28..2.48 rows=10 width=37)
         Index Cond: ((id >= 1) AND (id <= 10))
   ->  Index Scan using a_pkey on a t1  (cost=0.28..1.49 rows=1 width=37)
         Index Cond: (id = t2.id)
(5 rows)

这个QUERY包含了3个NODE,分别是index scan和nestloop.

得到进度条的原理如下

1. 生成执行计划(plan tree)

2. 按部就班的执行

3. 在执行过程中,我们可以跟踪执行的NODE,输出统计信息,打印QUERY的执行进度。

pg_query_state插件

Oleg 的Postgrespro公司开源的一个插件pg_query_state就是用来干这个事情的。

可以观察SQL执行过程中动态变化的信息,包括hit, run tim, loop, memory开销等等。

用法举例

编译安装

git clone https://github.com/postgrespro/pg_query_state

cd pg_query_state
git checkout PGPRO9_6

cd xx/postgresql-9.6.1
patch -p1 < ../pg_query_state/custom_signals.patch
patch -p1 < ../pg_query_state/runtime_explain.patch
make && make install

cd pg_query_state
git checkout PGPRO9_6
make USE_PGXS=1
make USE_PGXS=1 install

cd $PGDATA
vi postgresql.conf
shared_preload_libraries = 'pg_query_state'
pg_query_state.enable = on
pg_query_state.enable_buffers = on
pg_query_state.enable_timing = on

pg_ctl restart -m fast

观察进度例子1

postgres=# create extension pg_query_state;

session a
postgres=# do language plpgsql
$$
declare
begin
for i in 1..100 loop
  perform 1 from pg_class;
  perform pg_sleep(10);
  perform t1.relname from pg_class t1 join pg_class t2 on (t1.oid=t2.oid) where t1.oid::int8 <100000;
end loop;
end;
$$;

session b
postgres=# select pid,query from pg_stat_activity ;
  pid  |                                                query
-------+------------------------------------------------------------------------------------------------------
 36918 | do language plpgsql                                                                                 +
       | $$                                                                                                  +
       | declare                                                                                             +
       | begin                                                                                               +
       | for i in 1..100 loop                                                                                +
       |   perform 1 from pg_class;                                                                          +
       |   perform pg_sleep(10);                                                                             +
       |   select t1.relname from pg_class t1 join pg_class t2 on (t1.oid=t2.oid) where t1.oid::int8 <100000;+
       | end loop;                                                                                           +
       | end;                                                                                                +
       | $$;
 37000 | select pid,query from pg_stat_activity ;
(2 rows)

多次调用pg_query_state,可以看到running time在不断的变化。
postgres=# select * from pg_query_state(36918,true,true,true,true,true);
  pid  | frame_number |     query_text      |                                                    plan                                                     | leader_pid
-------+--------------+---------------------+-------------------------------------------------------------------------------------------------------------+------------
 36918 |            0 | SELECT pg_sleep(10) | Result  (cost=0.00..0.01 rows=1 width=4) (Current loop: running time=6362.586 actual rows=0, loop number=1)+|
       |              |                     |   Output: pg_sleep('10'::double precision)                                                                  |
(1 row)

postgres=# select * from pg_query_state(36918,true,true,true,true,true);
  pid  | frame_number |     query_text      |                                                    plan                                                     | leader_pid
-------+--------------+---------------------+-------------------------------------------------------------------------------------------------------------+------------
 36918 |            0 | SELECT pg_sleep(10) | Result  (cost=0.00..0.01 rows=1 width=4) (Current loop: running time=8155.538 actual rows=0, loop number=1)+|
       |              |                     |   Output: pg_sleep('10'::double precision)                                                                  |
(1 row)

再来个复杂一点的

观察进度例子2

session a
postgres=# \dt+
                     List of relations
 Schema |  Name  | Type  |  Owner   |  Size   | Description
--------+--------+-------+----------+---------+-------------
 public | sbtest | table | postgres | 3284 MB | 

postgres=# \d sbtest
                             Table "public.sbtest"
 Column |      Type      |                      Modifiers
--------+----------------+-----------------------------------------------------
 id     | integer        | not null default nextval('sbtest_id_seq'::regclass)
 k      | integer        | not null default 0
 c      | character(120) | not null default ''::bpchar
 pad    | character(60)  | not null default ''::bpchar
Indexes:
    "pk_sbtest_id" PRIMARY KEY, btree (id) WITH (fillfactor='100')

postgres=# select * from sbtest t1 join sbtest t2 on (t1.id=t2.id);

session b
可以看到,shared hit, runtiming,等都在变化
\x
postgres=# select * from pg_query_state(36918,true,true,true,true,true);
-[ RECORD 1 ]+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 36918
frame_number | 0
query_text   | select * from sbtest t1 join sbtest t2 on (t1.id=t2.id);
plan         | Merge Join  (cost=0.87..1649962.89 rows=15535467 width=380) (Current loop: actual time=0.014..1785.903 rows=2046680, loop number=1)                                           +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                +
             |   Merge Cond: (t1.id = t2.id)                                                                                                                                                 +
             |   Buffers: shared hit=263742                                                                                                                                                  +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t1  (cost=0.43..708465.44 rows=15535467 width=190) (Current loop: actual time=0.006..305.380 rows=2046680, loop number=1)+
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                     +
             |         Buffers: shared hit=131871                                                                                                                                            +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..708465.44 rows=15535467 width=190) (Current loop: actual time=0.004..469.854 rows=2046681, loop number=1)+
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                     +
             |         Buffers: shared hit=131872
leader_pid   | 

postgres=# select * from pg_query_state(36918,true,true,true,true,true);
-[ RECORD 1 ]+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 36918
frame_number | 0
query_text   | select * from sbtest t1 join sbtest t2 on (t1.id=t2.id);
plan         | Merge Join  (cost=0.87..1649962.89 rows=15535467 width=380) (Current loop: actual time=0.014..2648.177 rows=2882712, loop number=1)                                           +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                +
             |   Merge Cond: (t1.id = t2.id)                                                                                                                                                 +
             |   Buffers: shared hit=643548                                                                                                                                                  +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t1  (cost=0.43..708465.44 rows=15535467 width=190) (Current loop: actual time=0.006..453.747 rows=2882712, loop number=1)+
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                     +
             |         Buffers: shared hit=321774                                                                                                                                            +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..708465.44 rows=15535467 width=190) (Current loop: actual time=0.004..723.293 rows=2882712, loop number=1)+
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                     +
             |         Buffers: shared hit=321774
leader_pid   | 

postgres=# select * from pg_query_state(36918,true,true,true,true,true);
-[ RECORD 1 ]+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 36918
frame_number | 0
query_text   | select * from sbtest t1 join sbtest t2 on (t1.id=t2.id);
plan         | Merge Join  (cost=0.87..1649962.89 rows=15535467 width=380) (Current loop: actual time=0.014..4218.101 rows=4407527, loop number=1)                                            +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                 +
             |   Merge Cond: (t1.id = t2.id)                                                                                                                                                  +
             |   Buffers: shared hit=1288884                                                                                                                                                  +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t1  (cost=0.43..708465.44 rows=15535467 width=190) (Current loop: actual time=0.006..719.731 rows=4407527, loop number=1) +
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                      +
             |         Buffers: shared hit=644442                                                                                                                                             +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..708465.44 rows=15535467 width=190) (Current loop: actual time=0.004..1183.861 rows=4407527, loop number=1)+
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                      +
             |         Buffers: shared hit=644442
leader_pid   | 

查询结束,结果返回中,这时显示的是backend is idle
postgres=# \set VERBOSITY verbose
postgres=# select * from pg_query_state(36918,true,true,true,true,true);
INFO:  00000: state of backend is idle
LOCATION:  pg_query_state, pg_query_state.c:552
(0 rows)

观察进度例子3

来一个嵌套循环

postgres=# set enable_mergejoin=off;
SET
postgres=# set enable_hashjoin=off;
SET

session a
postgres=# select * from sbtest t1 join sbtest t2 on (t1.id=t2.id);

session b
postgres=# select * from pg_query_state(36918,true,true,true,true,true);
-[ RECORD 1 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 36918
frame_number | 0
query_text   | select * from sbtest t1 join sbtest t2 on (t1.id=t2.id);
plan         | Nested Loop  (cost=0.43..8236166.16 rows=15535467 width=380) (Current loop: actual time=0.019..1297.191 rows=520855, loop number=1)                                                                           +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                                +
             |   Buffers: shared hit=2101162 read=6                                                                                                                                                                          +
             |   ->  Seq Scan on public.sbtest t1  (cost=0.00..575579.67 rows=15535467 width=190) (Current loop: actual time=0.005..58.232 rows=520855, loop number=1)                                                       +
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                                                     +
             |         Buffers: shared hit=14078                                                                                                                                                                             +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.002..0.002 rows=1 loops=520854) (Current loop: actual time=0.004..0.004 rows=1, loop number=520855)+
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                                                     +
             |         Index Cond: (t2.id = t1.id)                                                                                                                                                                           +
             |         Buffers: shared hit=2087084 read=6
leader_pid   | 

postgres=# select * from pg_query_state(36918,true,true,true,true,true);
-[ RECORD 1 ]+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 36918
frame_number | 0
query_text   | select * from sbtest t1 join sbtest t2 on (t1.id=t2.id);
plan         | Nested Loop  (cost=0.43..8236166.16 rows=15535467 width=380) (Current loop: actual time=0.019..2879.993 rows=1280640, loop number=1)                                                                            +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                                  +
             |   Buffers: shared hit=5166184 read=15                                                                                                                                                                           +
             |   ->  Seq Scan on public.sbtest t1  (cost=0.00..575579.67 rows=15535467 width=190) (Current loop: actual time=0.005..131.568 rows=1280640, loop number=1)                                                       +
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                                                       +
             |         Buffers: shared hit=34612                                                                                                                                                                               +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.001..0.002 rows=1 loops=1280639) (Current loop: actual time=0.001..0.001 rows=1, loop number=1280640)+
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                                                       +
             |         Index Cond: (t2.id = t1.id)                                                                                                                                                                             +
             |         Buffers: shared hit=5131572 read=15
leader_pid   |

观察进度例子4

来一个并行查询

session a
postgres=# set max_parallel_workers_per_gather =4;
SET
postgres=# set force_parallel_mode =on;
SET
postgres=# select * from sbtest t1 join sbtest t2 on (t1.id=t2.id);

session b
postgres=# select * from pg_query_state(36918,true,true,true,true,true);
-[ RECORD 1 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 36918
frame_number | 0
query_text   | select * from sbtest t1 join sbtest t2 on (t1.id=t2.id);
plan         | Nested Loop  (cost=0.43..8236166.16 rows=15535467 width=380) (Current loop: actual time=0.019..1297.191 rows=520855, loop number=1)                                                                           +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                                +
             |   Buffers: shared hit=2101162 read=6                                                                                                                                                                          +
             |   ->  Seq Scan on public.sbtest t1  (cost=0.00..575579.67 rows=15535467 width=190) (Current loop: actual time=0.005..58.232 rows=520855, loop number=1)                                                       +
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                                                     +
             |         Buffers: shared hit=14078                                                                                                                                                                             +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.002..0.002 rows=1 loops=520854) (Current loop: actual time=0.004..0.004 rows=1, loop number=520855)+
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                                                     +
             |         Index Cond: (t2.id = t1.id)                                                                                                                                                                           +
             |         Buffers: shared hit=2087084 read=6
leader_pid   | 

postgres=# select * from pg_query_state(36918,true,true,true,true,true);
-[ RECORD 1 ]+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 36918
frame_number | 0
query_text   | select * from sbtest t1 join sbtest t2 on (t1.id=t2.id);
plan         | Nested Loop  (cost=0.43..8236166.16 rows=15535467 width=380) (Current loop: actual time=0.019..2879.993 rows=1280640, loop number=1)                                                                            +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                                  +
             |   Buffers: shared hit=5166184 read=15                                                                                                                                                                           +
             |   ->  Seq Scan on public.sbtest t1  (cost=0.00..575579.67 rows=15535467 width=190) (Current loop: actual time=0.005..131.568 rows=1280640, loop number=1)                                                       +
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                                                       +
             |         Buffers: shared hit=34612                                                                                                                                                                               +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.001..0.002 rows=1 loops=1280639) (Current loop: actual time=0.001..0.001 rows=1, loop number=1280640)+
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                                                       +
             |         Index Cond: (t2.id = t1.id)                                                                                                                                                                             +
             |         Buffers: shared hit=5131572 read=15
leader_pid   | 

postgres=# select * from pg_query_state(36918,true,true,true,true,true);
-[ RECORD 1 ]+----------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 36918
frame_number | 0
query_text   | select * from sbtest t1 join sbtest t2 on (t1.id=t2.id);
plan         | Nested Loop  (cost=0.43..8236166.16 rows=15535467 width=380) (Current loop: actual time=0.019..4281.219 rows=1948809, loop number=1)                     +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                           +
             |   Buffers: shared hit=7861798 read=24                                                                                                                    +
             |   ->  Seq Scan on public.sbtest t1  (cost=0.00..575579.67 rows=15535467 width=190) (Current loop: actual time=0.005..196.114 rows=1948810, loop number=1)+
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                +
             |         Buffers: shared hit=52671                                                                                                                        +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.001..0.002 rows=1 loops=1948809)              +
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                +
             |         Index Cond: (t2.id = t1.id)                                                                                                                      +
             |         Buffers: shared hit=7809127 read=24
leader_pid   | 

postgres=# select * from pg_query_state(36918,true,true,true,true,true);
-[ RECORD 1 ]+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 36918
frame_number | 0
query_text   | select * from sbtest t1 join sbtest t2 on (t1.id=t2.id);
plan         | Gather  (cost=0.43..2374210.41 rows=15535467 width=380) (Current loop: actual time=0.247..422.093 rows=1214899, loop number=1)                                                                                  +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                                  +
             |   Workers Planned: 4                                                                                                                                                                                            +
             |   Workers Launched: 4                                                                                                                                                                                           +
             |   Buffers: shared hit=5074                                                                                                                                                                                      +
             |   ->  Nested Loop  (cost=0.43..2374210.41 rows=15535467 width=380) (Current loop: actual time=0.016..2.697 rows=1258, loop number=1)                                                                            +
             |         Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                            +
             |         Buffers: shared hit=5074                                                                                                                                                                                +
             |         ->  Parallel Seq Scan on public.sbtest t1  (cost=0.00..459063.67 rows=3883867 width=190) (Current loop: actual time=0.003..0.132 rows=1258, loop number=1)                                              +
             |               Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                                                 +
             |               Buffers: shared hit=34                                                                                                                                                                            +
             |         ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.001..0.002 rows=1 loops=1257) (Current loop: actual time=0.002..0.002 rows=1, loop number=1258)+
             |               Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                                                 +
             |               Index Cond: (t2.id = t1.id)                                                                                                                                                                       +
             |               Buffers: shared hit=5040
leader_pid   |
-[ RECORD 2 ]+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 37973
frame_number | 0
query_text   | <parallel query>
plan         | Nested Loop  (cost=0.43..2374210.41 rows=15535467 width=380) (Current loop: actual time=0.015..734.275 rows=316572, loop number=1)                                                                              +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                                  +
             |   Buffers: shared hit=1277149                                                                                                                                                                                   +
             |   ->  Parallel Seq Scan on public.sbtest t1  (cost=0.00..459063.67 rows=3883867 width=190) (Current loop: actual time=0.004..32.660 rows=316573, loop number=1)                                                 +
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                                                       +
             |         Buffers: shared hit=8557                                                                                                                                                                                +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.002..0.002 rows=1 loops=316572)                                                                      +
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                                                       +
             |         Index Cond: (t2.id = t1.id)                                                                                                                                                                             +
             |         Buffers: shared hit=1268593
leader_pid   | 36918
-[ RECORD 3 ]+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 37972
frame_number | 0
query_text   | <parallel query>
plan         | Nested Loop  (cost=0.43..2374210.41 rows=15535467 width=380) (Current loop: actual time=0.015..726.460 rows=307861, loop number=1)                                                                              +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                                  +
             |   Buffers: shared hit=1241821                                                                                                                                                                                   +
             |   ->  Parallel Seq Scan on public.sbtest t1  (cost=0.00..459063.67 rows=3883867 width=190) (Current loop: actual time=0.004..32.622 rows=307861, loop number=1)                                                 +
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                                                       +
             |         Buffers: shared hit=8321                                                                                                                                                                                +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.002..0.002 rows=1 loops=307860) (Current loop: actual time=0.002..0.002 rows=1, loop number=307861)  +
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                                                       +
             |         Index Cond: (t2.id = t1.id)                                                                                                                                                                             +
             |         Buffers: shared hit=1233500
leader_pid   | 36918
-[ RECORD 4 ]+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 37971
frame_number | 0
query_text   | <parallel query>
plan         | Nested Loop  (cost=0.43..2374210.41 rows=15535467 width=380) (Current loop: actual time=0.015..719.893 rows=311324, loop number=1)                                                                              +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                                  +
             |   Buffers: shared hit=1255903                                                                                                                                                                                   +
             |   ->  Parallel Seq Scan on public.sbtest t1  (cost=0.00..459063.67 rows=3883867 width=190) (Current loop: actual time=0.005..32.527 rows=311324, loop number=1)                                                 +
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                                                       +
             |         Buffers: shared hit=8415                                                                                                                                                                                +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.002..0.002 rows=1 loops=311323) (Current loop: actual time=0.001..0.001 rows=1, loop number=311324)  +
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                                                       +
             |         Index Cond: (t2.id = t1.id)                                                                                                                                                                             +
             |         Buffers: shared hit=1247488
leader_pid   | 36918
-[ RECORD 5 ]+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 37970
frame_number | 0
query_text   | <parallel query>
plan         | Nested Loop  (cost=0.43..2374210.41 rows=15535467 width=380) (Current loop: actual time=0.024..748.139 rows=278393, loop number=1)                                                                              +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                                  +
             |   Buffers: shared hit=1123109                                                                                                                                                                                   +
             |   ->  Parallel Seq Scan on public.sbtest t1  (cost=0.00..459063.67 rows=3883867 width=190) (Current loop: actual time=0.007..32.654 rows=278393, loop number=1)                                                 +
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                                                       +
             |         Buffers: shared hit=7525                                                                                                                                                                                +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.002..0.002 rows=1 loops=278392) (Current loop: actual time=0.004..0.004 rows=1, loop number=278393)  +
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                                                       +
             |         Index Cond: (t2.id = t1.id)                                                                                                                                                                             +
             |         Buffers: shared hit=1115584
leader_pid   | 36918

postgres=# select * from pg_query_state(36918,true,true,true,true,true);
-[ RECORD 1 ]+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 36918
frame_number | 0
query_text   | select * from sbtest t1 join sbtest t2 on (t1.id=t2.id);
plan         | Gather  (cost=0.43..2374210.41 rows=15535467 width=380) (Current loop: actual time=0.247..1465.785 rows=4177376, loop number=1)                                                                                 +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                                  +
             |   Workers Planned: 4                                                                                                                                                                                            +
             |   Workers Launched: 4                                                                                                                                                                                           +
             |   Buffers: shared hit=5074                                                                                                                                                                                      +
             |   ->  Nested Loop  (cost=0.43..2374210.41 rows=15535467 width=380) (Current loop: actual time=0.016..2.697 rows=1258, loop number=1)                                                                            +
             |         Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                            +
             |         Buffers: shared hit=5074                                                                                                                                                                                +
             |         ->  Parallel Seq Scan on public.sbtest t1  (cost=0.00..459063.67 rows=3883867 width=190) (Current loop: actual time=0.003..0.132 rows=1258, loop number=1)                                              +
             |               Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                                                 +
             |               Buffers: shared hit=34                                                                                                                                                                            +
             |         ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.001..0.002 rows=1 loops=1257) (Current loop: actual time=0.002..0.002 rows=1, loop number=1258)+
             |               Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                                                 +
             |               Index Cond: (t2.id = t1.id)                                                                                                                                                                       +
             |               Buffers: shared hit=5040
leader_pid   |
-[ RECORD 2 ]+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 37973
frame_number | 0
query_text   | <parallel query>
plan         | Nested Loop  (cost=0.43..2374210.41 rows=15535467 width=380) (Current loop: actual time=0.015..2558.920 rows=1065636, loop number=1)                                                                            +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                                  +
             |   Buffers: shared hit=4299076 read=3                                                                                                                                                                            +
             |   ->  Parallel Seq Scan on public.sbtest t1  (cost=0.00..459063.67 rows=3883867 width=190) (Current loop: actual time=0.004..110.952 rows=1065636, loop number=1)                                               +
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                                                       +
             |         Buffers: shared hit=28801                                                                                                                                                                               +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.002..0.002 rows=1 loops=1065635) (Current loop: actual time=0.002..0.002 rows=1, loop number=1065636)+
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                                                       +
             |         Index Cond: (t2.id = t1.id)                                                                                                                                                                             +
             |         Buffers: shared hit=4270207 read=3
leader_pid   | 36918
-[ RECORD 3 ]+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 37972
frame_number | 0
query_text   | <parallel query>
plan         | Nested Loop  (cost=0.43..2374210.41 rows=15535467 width=380) (Current loop: actual time=0.015..2539.481 rows=1041373, loop number=1)                                                                            +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                                  +
             |   Buffers: shared hit=4200840 read=2                                                                                                                                                                            +
             |   ->  Parallel Seq Scan on public.sbtest t1  (cost=0.00..459063.67 rows=3883867 width=190) (Current loop: actual time=0.004..111.095 rows=1041373, loop number=1)                                               +
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                                                       +
             |         Buffers: shared hit=28149                                                                                                                                                                               +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.002..0.002 rows=1 loops=1041372) (Current loop: actual time=0.003..0.003 rows=1, loop number=1041373)+
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                                                       +
             |         Index Cond: (t2.id = t1.id)                                                                                                                                                                             +
             |         Buffers: shared hit=4172691 read=2
leader_pid   | 36918
-[ RECORD 4 ]+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 37971
frame_number | 0
query_text   | <parallel query>
plan         | Nested Loop  (cost=0.43..2374210.41 rows=15535467 width=380) (Current loop: actual time=0.015..2489.942 rows=1035703, loop number=1)                                                                            +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                                  +
             |   Buffers: shared hit=4178122 read=1                                                                                                                                                                            +
             |   ->  Parallel Seq Scan on public.sbtest t1  (cost=0.00..459063.67 rows=3883867 width=190) (Current loop: actual time=0.005..108.320 rows=1035703, loop number=1)                                               +
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                                                       +
             |         Buffers: shared hit=27993                                                                                                                                                                               +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.002..0.002 rows=1 loops=1035702) (Current loop: actual time=0.003..0.003 rows=1, loop number=1035703)+
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                                                       +
             |         Index Cond: (t2.id = t1.id)                                                                                                                                                                             +
             |         Buffers: shared hit=4150129 read=1
leader_pid   | 36918
-[ RECORD 5 ]+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid          | 37970
frame_number | 0
query_text   | <parallel query>
plan         | Nested Loop  (cost=0.43..2374210.41 rows=15535467 width=380) (Current loop: actual time=0.024..2562.090 rows=1033870, loop number=1)                                                                            +
             |   Output: t1.id, t1.k, t1.c, t1.pad, t2.id, t2.k, t2.c, t2.pad                                                                                                                                                  +
             |   Buffers: shared hit=4170779 read=1                                                                                                                                                                            +
             |   ->  Parallel Seq Scan on public.sbtest t1  (cost=0.00..459063.67 rows=3883867 width=190) (Current loop: actual time=0.007..110.571 rows=1033870, loop number=1)                                               +
             |         Output: t1.id, t1.k, t1.c, t1.pad                                                                                                                                                                       +
             |         Buffers: shared hit=27946                                                                                                                                                                               +
             |   ->  Index Scan using pk_sbtest_id on public.sbtest t2  (cost=0.43..0.48 rows=1 width=190) (actual time=0.002..0.002 rows=1 loops=1033869) (Current loop: actual time=0.003..0.003 rows=1, loop number=1033870)+
             |         Output: t2.id, t2.k, t2.c, t2.pad                                                                                                                                                                       +
             |         Index Cond: (t2.id = t1.id)                                                                                                                                                                             +
             |         Buffers: shared hit=4142833 read=1
leader_pid   | 36918

手册

pg_query_state

The pg_query_state module provides facility to know the current state of query execution on working backend. To enable this extension you have to patch the latest stable version of PostgreSQL. Different branches are intended for different version numbers of PostgreSQL, e.g., branch PG9_5 corresponds to PostgreSQL 9.5.

Overview

Each complex query statement (SELECT/INSERT/UPDATE/DELETE) after optimization/planning stage is translated into plan tree wich is kind of imperative representation of declarative SQL query. EXPLAIN ANALYZE request allows to demonstrate execution statistics gathered from each node of plan tree (full time of execution, number rows emitted to upper nodes, etc). But this statistics is collected after execution of query. This module allows to show actual statistics of query running on external backend. At that, format of resulting output is almost identical to ordinal EXPLAIN ANALYZE. Thus users are able to track of query execution in progress.

In fact, this module is able to explore external backend and determine its actual state. Particularly it's helpful when backend executes a heavy query or gets stuck.

Use cases

Using this module there can help in the following things:

  • detect a long query (along with other monitoring tools)
  • overwatch the query execution (example)

Installation

To install pg_query_state, please apply patches custom_signal.patch, executor_hooks.patch and runtime_explain.patch to the latest stable version of PostgreSQL and rebuild PostgreSQL.

Correspondence branch names to PostgreSQL version numbers:

  • PG9_5 --- PostgreSQL 9.5
  • PGPRO9_5 --- PostgresPro 9.5
  • master --- development version for PostgreSQL 10devel

Then execute this in the module's directory:

make install USE_PGXS=1

Add module name to the shared_preload_libraries parameter in postgresql.conf:

shared_preload_libraries = 'pg_query_state'

It is essential to restart the PostgreSQL instance. After that, execute the following query in psql:

CREATE EXTENSION pg_query_state;

Done!

Tests

Tests using parallel sessions using python 2.7 script:

   python tests/pg_qs_test_runner.py [OPTION]...

prerequisite packages:

  • psycopg2 version 2.6 or later
  • PyYAML version 3.11 or later

options:

  • - -host --- postgres server host, default value is localhost
  • - -port --- postgres server port, default value is 5432
  • - -database --- database name, default value is postgres
  • - -user --- user name, default value is postgres
  • - -password --- user's password, default value is empty

Function pg_query_state

pg_query_state(integer  pid,
               verbose  boolean DEFAULT FALSE,
               costs    boolean DEFAULT FALSE,
               timing   boolean DEFAULT FALSE,
               buffers  boolean DEFAULT FALSE,
               triggers boolean DEFAULT FALSE,
               format   text    DEFAULT 'text')

Extract current query state from backend with specified pid. Since parallel query can spawn workers and function call causes nested subqueries so that state of execution may be viewed as stack of running queries, return value of pg_query_state has type TABLE (pid integer, frame_number integer, query_text text, plan text, leader_pid integer). It represents tree structure consisting of leader process and its spawned workers. Each worker refers to leader through leader_pid column. For leader process the value of this column isnull. For each process the stack frames are specified as correspondence between frame_number, query_text and plan columns.

Thus, user can see the states of main query and queries generated from function calls for leader process and all workers spawned from it.

In process of execution some nodes of plan tree can take loops of full execution. Therefore statistics for each node consists of two parts: average statistics for previous loops just like in EXPLAIN ANALYZE output and statistics for current loop if node have not finished.

Optional arguments:

  • verbose --- use EXPLAIN VERBOSE for plan printing;
  • costs --- add costs for each node;
  • timing --- print timing data for each node, if collecting of timing statistics is turned off on called side resulting output will contain WARNING message timing statistics disabled;
  • buffers --- print buffers usage, if collecting of buffers statistics is turned off on called side resulting output will contain WARNING message buffers statistics disabled;
  • triggers --- include triggers statistics in result plan trees;
  • format --- EXPLAIN format to be used for plans printing, posible values: {text, xml, json, yaml}.

If callable backend is not executing any query the function prints INFO message about backend's state taken from pg_stat_activity view if it exists there.

Calling role have to be superuser or member of the role whose backend is being called. Othrewise function prints ERROR message permission denied.

Configuration settings

There are several user-accessible GUC variables designed to toggle the whole module and the collecting of specific statistic parameters while query is running:

  • pg_query_state.enable --- disable (or enable) pg_query_state completely, default value is true
  • pg_query_state.enable_timing --- collect timing data for each node, default value is false
  • pg_query_state.enable_buffers --- collect buffers usage, default value is false

This parameters is set on called side before running any queries whose states are attempted to extract. Warning: if pg_query_state.enable_timing is turned off the calling side cannot get time statistics, similarly for pg_query_state.enable_buffers parameter.

Examples

Assume one backend with pid = 20102 performs a simple query:

postgres=# select pg_backend_pid();
 pg_backend_pid
 ----------------
          20102
(1 row)
postgres=# select count(*) from foo join bar on foo.c1=bar.c1;

Other backend can extract intermediate state of execution that query:

postgres=# \x
postgres=# select * from pg_query_state(20102);
-[ RECORD 1 ]-----------------------------------------------------------------------------------
query_text | select count(*) from foo join bar on foo.c1=bar.c1;
plan       | Aggregate (Current loop: actual rows=0, loop number=1)                                                 +
           |   ->  Nested Loop (Current loop: actual rows=6, loop number=1)                                         +
           |         Join Filter: (foo.c1 = bar.c1)                                                                 +
           |         Rows Removed by Join Filter: 0                                                                 +
           |         ->  Seq Scan on bar (Current loop: actual rows=6, loop number=1)                               +
           |         ->  Materialize (actual rows=1000001 loops=5) (Current loop: actual rows=742878, loop number=6)+
           |               ->  Seq Scan on foo (Current loop: actual rows=1000001, loop number=1)

In example above Materialize node has statistics on passed loops (average number of rows delivered to Nested Loop and number of passed loops are shown) and statistics on current loop. Other nodes has statistics only for current loop as this loop is first (loop number = 1).

Assume first backend executes some function:

postgres=# select n_join_foo_bar();

Other backend can get the follow output:

postgres=# select * from pg_query_state(20102);
-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------
query_text | select n_join_foo_bar();
plan       | Result (Current loop: actual rows=0, loop number=1)
-[ RECORD 2 ]---------------------------------------------------------------------------------------------------------------
query_text | SELECT (select count(*) from foo join bar on foo.c1=bar.c1)
plan       | Result (Current loop: actual rows=0, loop number=1)                                                            +
           |   InitPlan 1 (returns $0)                                                                                      +
           |     ->  Aggregate (Current loop: actual rows=0, loop number=1)                                                 +
           |           ->  Nested Loop (Current loop: actual rows=8, loop number=1)                                         +
           |                 Join Filter: (foo.c1 = bar.c1)                                                                 +
           |                 Rows Removed by Join Filter: 0                                                                 +
           |                 ->  Seq Scan on bar (Current loop: actual rows=9, loop number=1)                               +
           |                 ->  Materialize (actual rows=1000001 loops=8) (Current loop: actual rows=665090, loop number=9)+
           |                       ->  Seq Scan on foo (Current loop: actual rows=1000001, loop number=1)

First row corresponds to function call, second - to query which is in the body of t

时间: 2024-08-31 11:16:29

官人要杯咖啡吗? - PostgreSQL实时监测PLAN tree的执行进度的相关文章

PostgreSQL 10.0 preview 功能增强 - QUERY进度显示

标签 PostgreSQL , 10.0 , query进度 背景 query进度的显示,有一个插件pg_query_state支持. PostgreSQL 10.0将在内核层面增加一个这样的支持. 首先会在analye命令上尝试,但是它会设计为通用的视图,支持其他命令,诸如CREATE INDEX, VAUUM, CLUSTER等. Hello Hackers, Following is a proposal for reporting the progress of ANALYZE comm

PostgreSQL 异步消息实践 - 亿级/分钟 FEED系统实时监测

标签 PostgreSQL , 异步消息 , 触发器 , 规则 , insert on conflict , 实时分析 背景 在很多业务系统中,为了定位问题.运营需要.分析需要或者其他需求,会在业务中设置埋点,记录用户的行为在业务系统中产生的日志,也叫FEED日志. 比如订单系统.在业务系统中环环相扣,从购物车.下单.付款.发货,收货(还有纠纷.退款等等),一笔订单通常会产生若干相关联的记录. 每个环节产生的属性可能是不一样的,有可能有新的属性产生,也有可能变更已有的属性值. 为了便于分析,通常

医疗大健康行业案例(老人健康实时监测和预警) - 阿里云RDS PostgreSQL最佳实践

标签 PostgreSQL , pipelineDB , 流式计算 , 独立事件相关性 , 舆情分析 , 实时状态分析 , 递归查询 , 时序数据 背景 人的身体和机器差不多,随着年龄的增长,器官逐渐老化,毛病也会越来越多,注意保养是一方面,另一方面也需要注意实时的监测和发出预警,在问题萌芽状态就解决掉. 以往我们检查身体得去医院或专业的体检机构,很麻烦,随着科技的进步,一些健康指标的监测变得更加方便,例如手环也是一个普及很快的监控检测终端(目前已能够检测心跳.温度.运动等各项指标),未来这种终

[收藏学习]Linux的实时监测命令

十八.  和系统运行状况相关的Shell命令:     1. Linux的实时监测命令(watch):     watch 是一个非常实用的命令,可以帮你实时监测一个命令的运行结果,省得一遍又一遍的手动运行.该命令最为常用的两个选项是-d和-n,其中-n表示间隔多少秒 执行一次"command",-d表示高亮发生变化的位置.下面列举几个在watch中常用的实时监视命令:     /> watch -d -n 1 'who'   #每隔一秒执行一次who命令,以监视服务器当前用户登

进程监控-关于windows下非子进程之间的运行状态实时监测

问题描述 关于windows下非子进程之间的运行状态实时监测 情况是这么回事 我在做个windows平台的东西,需要一个管理进程,管理着一堆服务进程,然后我需要在服务进程异常关闭的时候实时通知管理进程. 因为实时性要求的原因,所以心跳的方式直接被我无视了. 因为主要是想达到服务进程异常关闭直接宕掉的情况,因此我找来找去也没啥好办法. 唯一可以借鉴的方法是socket的方式,当一对socket连接中的服务器和客户端,服务器用recv阻塞住,当客户端异常关闭的时候服务器recv直接返回SOCKETE

android studio不能实时监测代码拼写错误的问题

问题描述 android studio不能实时监测代码拼写错误的问题 为什么我的代码拼写错误,studio不显示任何提示或警告呢? 就像eclipse中的红曲线一样 解决方案 android studio是会提示代码错误的,除非你的代码没有错 解决方案二: android studio代码格式化问题 解决方案三: 我找到错了,我是没有选上POWER SAVE MODE··· 解决方案四: 我也有一样的问题,不知道是不是跟我更新了有关···请问楼主找到问题了吗 解决方案五: 这个博客上有Andro

安装apachetop工具实时监测apache运行情况

我们经常会需要知道服务器的实时监测服务器的运行状况,比如哪些 URL 的访问量最大,服务器每秒的请求数,哪个搜索引擎正在抓取我们网站?面对这些问题,我们虽然可以去分析查看统计访问日志文件,但是却不能让我们实时统计,不能给我们直观的统计数据.现在好了,apachetop这个工具就可以实时的跟踪log的变化,能查看访问者正在查看哪些文件,和访问者IP等信息. 1. apachetop安装 # rpm -ivh http://mirrors.hustunique.com/epel/5/i386/epe

麦考林再与第三方物流结盟,实现配送信息实时监测

此前向同行开放物流平台后,麦考林又与多家第三方物流结盟,从而让消费者实现了配送信息实时监测. 据了解,麦考林自主开发的TMS物流管理系统与合作的第三方物流公司对接完成后,消费者可实时查询包裹从出库.城际运输.到"最后一公里"的每一个配送状态,还可查询快递站点名称和电话等信息. 麦考林相关负责人表示,目前80多家与麦考林合作的COD(货到付款)公司全部接入TMS系统,除邮政系统外的配送都实现了可追踪,该系统能通过EMS的网站查询. 近年来,随着国内电子商务迅速成长,电商企业纷纷开放物流平

tawkon:实时监测手机辐射量

tawkon随着移动互联的迅猛发展,手机的使用频率越来越高.有些人甚至如厕的时候都要随身携带手机.想必大家都知道 手机辐射对人体有害.虽然知道有害,但是 普通的手机很难告知实时又准确的辐射量.今天推荐给大家的tawkon就是一款能够实时监测手机辐射量的移动应用.目前适用于Android. Blackberry.iPhone.Windows Mobile和Palm等智能手机.iOS用户需要越狱后在cydia搜索安装.目前官方正在进行联名上书申请上架App store.tawkon android