PostgreSQL 10.0 preview 性能增强 - hash,nestloop join优化(聪明的优化器是这样的)


PostgreSQL , 10.0 , nestloop , hash join



hash join同样适用。


postgres=# create table intbl(id int);
postgres=# create unique index idx_intbl on intbl(id);
postgres=# insert into intbl select generate_series(1,1000000);
INSERT 0 1000000
postgres=# create table out(id int);
postgres=# insert into out select 1 from generate_series(1,1000);
-- 对于外表的1000个1, 内表scan一次命中后,同一个值不需要再次scan内表
INSERT 0 1000
postgres=# set enable_hashjoin =off;
postgres=# set enable_mergejoin =off;
postgres=# set enable_material =off;


postgres=# explain (analyze,verbose,timing,costs,buffers) select * from out,intbl where;
                                                              QUERY PLAN
 Nested Loop  (cost=0.42..2736.00 rows=1000 width=8) (actual time=0.033..1.965 rows=1000 loops=1)
   Output: "out".id,
   Buffers: shared hit=4005
   ->  Seq Scan on public."out"  (cost=0.00..15.00 rows=1000 width=4) (actual time=0.013..0.101 rows=1000 loops=1)
         Output: "out".id
         Buffers: shared hit=5
   ->  Index Only Scan using idx_intbl on public.intbl  (cost=0.42..2.71 rows=1 width=4) (actual time=0.001..0.002 rows=1 loops=1000)
         Index Cond: ( = "out".id)
         Heap Fetches: 1000
         Buffers: shared hit=4000
 Planning time: 0.109 ms
 Execution time: 2.048 ms
(13 rows)


postgres=# explain (analyze,verbose,timing,costs,buffers) select * from out,intbl where;
                                                              QUERY PLAN
 Nested Loop  (cost=0.42..2202.50 rows=1000 width=8) (actual time=0.035..1.803 rows=1000 loops=1)
   Output: "out".id,
   Inner Unique: true
   Buffers: shared hit=4005
   ->  Seq Scan on public."out"  (cost=0.00..15.00 rows=1000 width=4) (actual time=0.013..0.106 rows=1000 loops=1)
         Output: "out".id
         Buffers: shared hit=5
   ->  Index Only Scan using idx_intbl on public.intbl  (cost=0.42..2.19 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1000)
         Index Cond: ( = "out".id)
         Heap Fetches: 1000
         Buffers: shared hit=4000
 Planning time: 0.122 ms
 Execution time: 1.887 ms
(14 rows)


Optimize joins when the inner relation can be proven unique.  

author	Tom Lane <>
Sat, 8 Apr 2017 10:20:03 +0800 (22:20 -0400)
committer	Tom Lane <>
Sat, 8 Apr 2017 10:20:13 +0800 (22:20 -0400)
commit	9c7f5229ad68d7e0e4dd149e3f80257893e404d4
tree	0a167d403952550f43941b01b24ed5e7526c5351	tree | snapshot
parent	f13a9121f9822eafe05cc3178bf046155a248173	commit | diff
Optimize joins when the inner relation can be proven unique.  

If there can certainly be no more than one matching inner row for a given
outer row, then the executor can move on to the next outer row as soon as
it's found one match; there's no need to continue scanning the inner
relation for this outer row.  This saves useless scanning in nestloop
and hash joins.  In merge joins, it offers the opportunity to skip
mark/restore processing, because we know we have not advanced past the
first possible match for the next outer row.  

Of course, the devil is in the details: the proof of uniqueness must
depend only on joinquals (not otherquals), and if we want to skip
mergejoin mark/restore then it must depend only on merge clauses.
To avoid adding more planning overhead than absolutely necessary,
the present patch errs in the conservative direction: there are cases
where inner_unique or skip_mark_restore processing could be used, but
it will not do so because it's not sure that the uniqueness proof
depended only on "safe" clauses.  This could be improved later.  

David Rowley, reviewed and rather heavily editorialized on by me  





时间: 2024-09-13 01:19:47

