prepare statement的目的是让同样的SQL(无参数)或者类似的SQL(有参数)仅需要一次parse analyse, 因此减少了parse analyse的次数.
同时使用 prepare statement 还有一个功效是cache plan, 也就是说plan的次数也可以减少 .
本文介绍一下PostgreSQL的几种prepare statement的方法, 仅供参考.
首先是SPI, 如SPI_prepare, 语法如下 :
SPIPlanPtr SPI_prepare(const char * command, int nargs, Oid * argtypes)
const char * command
command string
int nargs
number of input parameters ($1, $2, etc.)
Oid * argtypes
pointer to an array containing the OIDs of the data types of the parameters
注意SPI_prepare返回结果类型为SPIPlanPtr(一个指针), 用于存储plan的数据.
作为SPI_execute_plan或者SPI_execute_plan_with_paramlist 的参数传入 :
因为SPI_prepare可以带参数(nargs,argtypes), 也可以不带参数. SPI_execute_plan建立generic plan 对带参数和不带参数的处理方式略有不同, 见下面的解释.
1, 无参数的情况.
If no parameters are defined, a generic plan will be created at the first use of SPI_execute_plan, and used for all subsequent executions as well.
2, 有参数的情况.
If there are parameters, the first few uses of SPI_execute_plan will generate custom plans that are specific to the supplied parameter values.
After enough uses of the same prepared statement, SPI_execute_plan will build a generic plan, and if that is not too much more expensive than the custom plans, it will start using the generic plan instead of re-planning each time.
另外就是使用如果使用prepared 游标, SPI_prepare_cursor, 可以指定flag. 如下 :
If this default behavior is unsuitable, you can alter it by passing the CURSOR_OPT_GENERIC_PLAN or CURSOR_OPT_CUSTOM_PLAN flag to SPI_prepare_cursor, to force use of generic or custom plans respectively.
这些flag来自src/include/nodes/parsenodes.h
/* ----------------------
* Declare Cursor Statement
*
* Note: the "query" field of DeclareCursorStmt is only used in the raw grammar
* output. After parse analysis it's set to null, and the Query points to the
* DeclareCursorStmt, not vice versa.
* ----------------------
*/
#define CURSOR_OPT_BINARY 0x0001 /* BINARY */
#define CURSOR_OPT_SCROLL 0x0002 /* SCROLL explicitly given */
#define CURSOR_OPT_NO_SCROLL 0x0004 /* NO SCROLL explicitly given */
#define CURSOR_OPT_INSENSITIVE 0x0008 /* INSENSITIVE */
#define CURSOR_OPT_HOLD 0x0010 /* WITH HOLD */
/* these planner-control flags do not correspond to any SQL grammar: */
#define CURSOR_OPT_FAST_PLAN 0x0020 /* prefer fast-start plan */
#define CURSOR_OPT_GENERIC_PLAN 0x0040 /* force use of generic plan */
#define CURSOR_OPT_CUSTOM_PLAN 0x0080 /* force use of custom plan */
SPI_prepare_cursor 用法 如下 :
SPIPlanPtr SPI_prepare_cursor(const char * command, int nargs,
Oid * argtypes, int cursorOptions)
PL/pgsql cache plan 实际上都是调用的SPI. 所以处理cache plan也分两种情况 (带参数和不带参数).
例如如果以下SQL在PL/pgSQL中执行 :
1. select id,info from test where id=$1; 这属于带参数的SQL语句.
所以SPI_prepare这条SQL, 当第一次用SPI_execute_plan执行后不会马上建立generic plan. 而需要有足够多次使用到同一个prepare statement后才会建立generic plan.
2. select id,info from test where id=1; 这属于不带参数的SQL语句.
所以SPI_prepare这条SQL, 当第一次用SPI_execute_plan执行后将会建立generic plan.
选择使用generic plan还是重新plan(custom plan), 用到这个函数 :
src/backend/utils/cache/plancache.c
00840 /*
00841 * choose_custom_plan: choose whether to use custom or generic plan
00842 *
00843 * This defines the policy followed by GetCachedPlan.
00844 */
00845 static bool
00846 choose_custom_plan(CachedPlanSource *plansource, ParamListInfo boundParams)
00847 {
00848 double avg_custom_cost;
00849
00850 /* Never any point in a custom plan if there's no parameters */
00851 if (boundParams == NULL)
00852 return false;
00853
00854 /* See if caller wants to force the decision */
00855 if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN)
00856 return false;
00857 if (plansource->cursor_options & CURSOR_OPT_CUSTOM_PLAN)
00858 return true;
00859
00860 /* Generate custom plans until we have done at least 5 (arbitrary) */
00861 if (plansource->num_custom_plans < 5)
00862 return true;
00863
00864 avg_custom_cost = plansource->total_custom_cost / plansource->num_custom_plans;
00865
00866 /*
00867 * Prefer generic plan if it's less than 10% more expensive than average
00868 * custom plan. This threshold is a bit arbitrary; it'd be better if we
00869 * had some means of comparing planning time to the estimated runtime cost
00870 * differential.
00871 *
00872 * Note that if generic_cost is -1 (indicating we've not yet determined
00873 * the generic plan cost), we'll always prefer generic at this point.
00874 */
00875 if (plansource->generic_cost < avg_custom_cost * 1.1)
00876 return false;
00877
00878 return true;
00879 }
从函数内容来看,
首先无绑定参数的情况使用generic plan, 因此函数返回false.
然后判断cursor_option是否包含CURSOR_OPT_GENERIC_PLAN和CURSOR_OPT_CUSTOM_PLAN. 直接返回true或false
然后判断num_custom_plans, 如果大于等于5则使用generic plan. 小于5则返回true.
最后如果上面都没有返回, 那么进入cost的判断. 如果plansource->generic_cost < avg_custom_cost * 1.1 则选择generic plan.
generic_cost , total_custom_cost 和 num_custom_plans都记录在CachedPlanSource这个数据结构里面. 因此custom plan不是每次都产生, 而是通过以上函数来选择的. 每使用一次custom plan都会更新total_custom_cost 和 num_custom_plans的数据, 用作下次判断的依据.
下面举例说明一下第二种情况, 也就是有参数的情况 :
首先生成测试数据 :
测试表 :
digoal=> create table test(id int, info text);
CREATE TABLE
测试数据 :
digoal=> insert into test select 1,repeat(random()::text,10) from generate_series(1,500000);
INSERT 0 500000
digoal=> insert into test values (2,'test');
INSERT 0 1
digoal=> select id,count(*) from test group by id;
id | count
----+--------
1 | 500000
2 | 1
(2 rows)
索引 :
digoal=> create index idx_test_1 on test(id);
CREATE INDEX
接下来使用PL/pgsql 的cache plan情况测试 :
PL/pgSQL的cache plan用法细节参见http://www.postgresql.org/docs/9.2/static/plpgsql-implementation.html
创建函数 :
CREATE OR REPLACE FUNCTION digoal.f_immutable(i_id integer)
RETURNS bigint
LANGUAGE plpgsql
STRICT IMMUTABLE
AS $function$
declare
result bigint;
begin
select count(*) into result from digoal.test where id=i_id;
return result;
end;
$function$;
测试 :
digoal=> \timing
Timing is on.
注意这些SQL的执行耗时. 很明显的区分索引扫描和全表扫描.
使用PL/pgsql 封装的 prepared statement 在第一次执行后SPI_execute_plan并没有马上产生 generic plan.
digoal=> select * from f_immutable(1);
f_immutable
-------------
500000
(1 row)
Time: 148.080 ms
因此第二次执行f_immutable(2) 依然进行了plan , 也就是用了custom plan.
因为1走了全表扫描, 2走了索引扫描.
digoal=> select * from f_immutable(2);
f_immutable
-------------
1
(1 row)
Time: 0.736 ms
执行计划如下 :
digoal=> explain analyze select count(*) from test where id=1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=20637.01..20637.02 rows=1 width=0) (actual time=245.336..245.336 rows=1 loops=1)
-> Seq Scan on test (cost=0.00..19387.01 rows=500001 width=0) (actual time=0.012..166.642 rows=500000 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 1
Total runtime: 245.384 ms
(5 rows)
Time: 246.075 ms
digoal=> explain analyze select count(*) from test where id=2;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1.41..1.42 rows=1 width=0) (actual time=0.031..0.031 rows=1 loops=1)
-> Index Only Scan using idx_test_1 on test (cost=0.00..1.40 rows=1 width=0) (actual time=0.023..0.024 rows=1 loops=1)
Index Cond: (id = 2)
Heap Fetches: 1
Total runtime: 0.083 ms
(5 rows)
Time: 0.772 ms
那么如SPI_prepare所述, 同一个prepare statement被SPI_execute_plan多次使用后, 将会生成generic plan.
所以接下来我多次(这里选择5次) 调用 select * from f_immutable(1);
根据前面提到的choose_custom_plan函数, 5次后将会进入plansource->generic_cost < avg_custom_cost * 1.1 的判断.
因为前面5次的cost都一样, 所以存储在plansource里的custom cost相关数值得出的custom cost平均值与generic_cost没有相差10%. 所以会走generic plan. 那就可以解释为什么第六次的SQL: select * from f_immutable(2); 未走索引了.
digoal=> \timing
Timing is on.
digoal=> select * from f_immutable(1);
f_immutable
-------------
500000
(1 row)
Time: 242.742 ms
digoal=> select * from f_immutable(1);
f_immutable
-------------
500000
(1 row)
Time: 179.910 ms
digoal=> select * from f_immutable(1);
f_immutable
-------------
500000
(1 row)
Time: 180.052 ms
digoal=> select * from f_immutable(1);
f_immutable
-------------
500000
(1 row)
Time: 180.027 ms
digoal=> select * from f_immutable(1);
f_immutable
-------------
500000
(1 row)
Time: 179.758 ms
经过以上5次调用select * from f_immutable(1);后,
所以下面把参数改成2, 也不走索引扫描了. 执行时间139毫秒.
digoal=> select * from f_immutable(2);
f_immutable
-------------
1
(1 row)
Time: 139.941 ms
digoal=> select * from f_immutable(2);
f_immutable
-------------
1
(1 row)
Time: 139.994 ms
将函数的strict改成stable和volatile测试的结果与上面一致, 因为和函数的volatile无关.
digoal=> alter function f_immutable(int) strict volatile;
ALTER FUNCTION
Time: 0.490 ms
digoal=> alter function f_immutable(int) strict stable;
ALTER FUNCTION
Time: 0.451 ms
测试结果略.
接下来使用prepare SQL COMMAND进行测试 :
结果和PL/pgsql一致.
digoal=> \timing
Timing is on.
digoal=> prepare p_test(int) as select count(*) from test where id=$1;
PREPARE
Time: 1.154 ms
digoal=> explain analyze execute p_test(1);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=20637.01..20637.02 rows=1 width=0) (actual time=298.463..298.463 rows=1 loops=1)
-> Seq Scan on test (cost=0.00..19387.01 rows=500001 width=0) (actual time=0.034..220.116 rows=500000 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 1
Total runtime: 298.580 ms
(5 rows)
Time: 299.951 ms
digoal=> explain analyze execute p_test(1);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=20637.01..20637.02 rows=1 width=0) (actual time=243.990..243.990 rows=1 loops=1)
-> Seq Scan on test (cost=0.00..19387.01 rows=500001 width=0) (actual time=0.012..166.291 rows=500000 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 1
Total runtime: 244.040 ms
(5 rows)
Time: 244.800 ms
digoal=> explain analyze execute p_test(1);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=20637.01..20637.02 rows=1 width=0) (actual time=244.184..244.184 rows=1 loops=1)
-> Seq Scan on test (cost=0.00..19387.01 rows=500001 width=0) (actual time=0.012..166.416 rows=500000 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 1
Total runtime: 244.235 ms
(5 rows)
Time: 244.817 ms
digoal=> explain analyze execute p_test(1);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=20637.01..20637.02 rows=1 width=0) (actual time=244.380..244.380 rows=1 loops=1)
-> Seq Scan on test (cost=0.00..19387.01 rows=500001 width=0) (actual time=0.012..166.055 rows=500000 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 1
Total runtime: 244.432 ms
(5 rows)
Time: 245.028 ms
digoal=> explain analyze execute p_test(1);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=20637.01..20637.02 rows=1 width=0) (actual time=244.029..244.029 rows=1 loops=1)
-> Seq Scan on test (cost=0.00..19387.01 rows=500001 width=0) (actual time=0.011..166.128 rows=500000 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 1
Total runtime: 244.081 ms
(5 rows)
Time: 244.701 ms
传入参数2, 使用了generic plan, 而没有执行custom plan.(本来参数2应该走索引.)
digoal=> explain analyze execute p_test(2);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=20637.01..20637.02 rows=1 width=0) (actual time=115.265..115.265 rows=1 loops=1)
-> Seq Scan on test (cost=0.00..19387.01 rows=500001 width=0) (actual time=115.257..115.257 rows=1 loops=1)
Filter: (id = $1)
Rows Removed by Filter: 500000
Total runtime: 115.317 ms
(5 rows)
Time: 116.046 ms
下面把前5次的参数改成2, 主要说明choose_custom_plan的 plansource->generic_cost < avg_custom_cost * 1.1 比较过程.
如下 :
ocz@db-172-16-3-150-> psql digoal digoal
psql (9.2.1)
Type "help" for help.
digoal=> prepare p_test(int) as select count(*) from test where id=$1;
PREPARE
digoal=> explain analyze execute p_test(2);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1.41..1.42 rows=1 width=0) (actual time=0.083..0.084 rows=1 loops=1)
-> Index Only Scan using idx_test_1 on test (cost=0.00..1.40 rows=1 width=0) (actual time=0.074..0.075 rows=1 loops=1)
Index Cond: (id = 2)
Heap Fetches: 1
Total runtime: 0.200 ms
(5 rows)
digoal=> explain analyze execute p_test(2);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1.41..1.42 rows=1 width=0) (actual time=0.026..0.026 rows=1 loops=1)
-> Index Only Scan using idx_test_1 on test (cost=0.00..1.40 rows=1 width=0) (actual time=0.021..0.022 rows=1 loops=1)
Index Cond: (id = 2)
Heap Fetches: 1
Total runtime: 0.074 ms
(5 rows)
digoal=> explain analyze execute p_test(2);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1.41..1.42 rows=1 width=0) (actual time=0.021..0.021 rows=1 loops=1)
-> Index Only Scan using idx_test_1 on test (cost=0.00..1.40 rows=1 width=0) (actual time=0.017..0.018 rows=1 loops=1)
Index Cond: (id = 2)
Heap Fetches: 1
Total runtime: 0.065 ms
(5 rows)
digoal=> explain analyze execute p_test(2);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1.41..1.42 rows=1 width=0) (actual time=0.020..0.020 rows=1 loops=1)
-> Index Only Scan using idx_test_1 on test (cost=0.00..1.40 rows=1 width=0) (actual time=0.016..0.016 rows=1 loops=1)
Index Cond: (id = 2)
Heap Fetches: 1
Total runtime: 0.063 ms
(5 rows)
digoal=> explain analyze execute p_test(2);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1.41..1.42 rows=1 width=0) (actual time=0.020..0.020 rows=1 loops=1)
-> Index Only Scan using idx_test_1 on test (cost=0.00..1.40 rows=1 width=0) (actual time=0.016..0.017 rows=1 loops=1)
Index Cond: (id = 2)
Heap Fetches: 1
Total runtime: 0.061 ms
(5 rows)
第六次传入参数1, 走的是custom plan.
digoal=> explain analyze execute p_test(1);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=20637.01..20637.02 rows=1 width=0) (actual time=343.385..343.385 rows=1 loops=1)
-> Seq Scan on test (cost=0.00..19387.01 rows=500001 width=0) (actual time=0.053..253.714 rows=500000 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 1
Total runtime: 343.458 ms
(5 rows)
下面来解释一下原因 :
1. 第6次执行时,
num_custom_plans = 5
total_custom_cost = 7.1 /* (1.42*5) */
generic_cost等于多少呢? 还是-1(初始值) , 从后面使用gdb跟踪的结果可以看到.
因此choose_custom_plan的执行结果为true. 也就是选择custom plan, 因此需要额外的plan 优化选择过程.
到第7次执行时会变成20637.
GDB跟踪举例 :
首先确保编译PostgreSQL使用了enable-debug :
ocz@db-172-16-3-150-> pg_config --configure
'--prefix=/home/ocz/pgsql9.2.1' '--with-pgport=9201' '--with-perl' '--with-python' '--with-tcl' '--with-openssl' '--with-pam' '--without-ldap' '--with-libxml' '--with-libxslt' '--enable-thread-safety' '--with-wal-blocksize=16' '--enable-debug' '--with-ossp-uuid' '--with-libs=/opt/uuid-1.6.2/lib'
开始测试 :
终端1 :
digoal=> prepare p_test(int) as select count(*) from test where id=$1;
PREPARE
digoal=> explain analyze execute p_test(2);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1.41..1.42 rows=1 width=0) (actual time=0.113..0.113 rows=1 loops=1)
-> Index Only Scan using idx_test_1 on test (cost=0.00..1.40 rows=1 width=0) (actual time=0.102..0.103 rows=1 loops=1)
Index Cond: (id = 2)
Heap Fetches: 1
Total runtime: 0.269 ms
(5 rows)
digoal=> select pg_backend_pid();
pg_backend_pid
----------------
10921
(1 row)
终端2 :
ocz@db-172-16-3-150-> gdb
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-37.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
绑定进程号.
(gdb) attach 10921
Attaching to process 10921
设置断点
(gdb) break choose_custom_plan
Breakpoint 1 at 0x6ee730: file plancache.c, line 850.
执行到断点位置
(gdb) continue
Continuing.
终端1 :
digoal=> explain analyze execute p_test(2);
终端2 :
到达断点,
Breakpoint 1, choose_custom_plan (plansource=0x14477350, boundParams=0x144b1678) at plancache.c:850
850 if (boundParams == NULL)
进入单步模式
(gdb) step
854 if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN)
(gdb)
856 if (plansource->cursor_options & CURSOR_OPT_CUSTOM_PLAN)
(gdb)
860 if (plansource->num_custom_plans < 5)
(gdb)
choose_custom_plan (plansource=0x14477350, boundParams=0x144b1678) at plancache.c:878
878 }
到了choose_custom_plan 后, 打印一下plansource->total_custom_cost
(gdb) print plansource->total_custom_cost
$1 = 1.4174
继续下一轮
(gdb) continue
Continuing.
终端1 :
digoal=> explain analyze execute p_test(2);
终端2 :
Breakpoint 1, choose_custom_plan (plansource=0x14477350, boundParams=0x144b1678) at plancache.c:850
850 if (boundParams == NULL)
(gdb) step
854 if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN)
(gdb)
856 if (plansource->cursor_options & CURSOR_OPT_CUSTOM_PLAN)
(gdb)
860 if (plansource->num_custom_plans < 5)
(gdb)
choose_custom_plan (plansource=0x14477350, boundParams=0x144b1678) at plancache.c:878
878 }
注意此时total_custom_cost增长了.
(gdb) print plansource->total_custom_cost
$2 = 2.8348
此时还未生成generic plan, 所以generic_cost=-1
(gdb) print plansource->generic_cost
$3 = -1
(gdb) continue
Continuing.
终端1 :
digoal=> explain analyze execute p_test(2);
略去中间几步, 下面是第6次和第7次执行SQL时跟踪到的结果, 说明第6次执行完后生成了generic plan.
终端2 :
Breakpoint 1, choose_custom_plan (plansource=0x143998c0, boundParams=0x1439aff8) at plancache.c:850
850 if (boundParams == NULL)
(gdb) step
854 if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN)
(gdb)
856 if (plansource->cursor_options & CURSOR_OPT_CUSTOM_PLAN)
(gdb)
860 if (plansource->num_custom_plans < 5)
(gdb)
GetCachedPlan (plansource=0x143998c0, boundParams=0x1439aff8, useResOwner=1 '\001') at plancache.c:941
941 if (!customplan)
(gdb) print plansource->generic_cost
$4 = -1
(gdb) print plansource->total_custom_cost
$5 = 7.0869999999999997
(gdb) print plansource->num_custom_plans
$6 = 5
(gdb) continue
Continuing.
Breakpoint 1, choose_custom_plan (plansource=0x143998c0, boundParams=0x1439aff8) at plancache.c:850
850 if (boundParams == NULL)
(gdb) continue
Continuing.
Breakpoint 1, choose_custom_plan (plansource=0x143998c0, boundParams=0x1439abe8) at plancache.c:850
850 if (boundParams == NULL)
(gdb) step
854 if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN)
(gdb)
856 if (plansource->cursor_options & CURSOR_OPT_CUSTOM_PLAN)
(gdb)
860 if (plansource->num_custom_plans < 5)
(gdb)
GetCachedPlan (plansource=0x143998c0, boundParams=0x1439abe8, useResOwner=1 '\001') at plancache.c:941
941 if (!customplan)
(gdb) print plansource->generic_cost
$7 = 20637.024999999998
(gdb) print plansource->total_custom_cost
$8 = 8.5044000000000004
(gdb) print plansource->num_custom_plans
$9 = 6
generic_cost = 20637.024999999998 对应id=1的执行计划得到的cost, 如下 :
digoal=> explain analyze execute p_test(1);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=20637.01..20637.02 rows=1 width=0) (actual time=433.100..433.100 rows=1 loops=1)
-> Seq Scan on test (cost=0.00..19387.01 rows=500001 width=0) (actual time=0.133..344.716 rows=500000 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 1
为什么id=1的执行计划作为generic plan了. 因为id=1的值在test表的出现频率最高.
通过pg_stats视图可以看到. 如下 :
digoal=> \d pg_stats
View "pg_catalog.pg_stats"
Column | Type | Modifiers
------------------------+----------+-----------
schemaname | name |
tablename | name |
attname | name |
inherited | boolean |
null_frac | real |
avg_width | integer |
n_distinct | real |
most_common_vals | anyarray |
most_common_freqs | real[] |
histogram_bounds | anyarray |
correlation | real |
most_common_elems | anyarray |
most_common_elem_freqs | real[] |
elem_count_histogram | real[] |
digoal=> \x
Expanded display is on.
digoal=> select * from pg_stats where schemaname='digoal' and tablename='test' and attname='id';
-[ RECORD 1 ]----------+-------
schemaname | digoal
tablename | test
attname | id
inherited | f
null_frac | 0
avg_width | 4
n_distinct | 1
most_common_vals | {1}
most_common_freqs | {1}
histogram_bounds |
correlation | 1
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |
注意这两项 :
most_common_vals | {1}
most_common_freqs | {1}
采样出来它的出现频率是100%
【其他】
1. 使用SPI_prepare_cursor, 通过设置flag [CURSOR_OPT_GENERIC_PLAN or CURSOR_OPT_CUSTOM_PLAN] 可以强制指定使用custom plan还是generic plan.
2. generic plan的生命周期 :
The statement returned by SPI_prepare can be used only in the current invocation of the procedure, since SPI_finish frees memory allocated for such a statement.
But the statement can be saved for longer using the functions SPI_keepplan or SPI_saveplan. 一般不推荐使用SPI_saveplan, 因为数据会重复.
SPI_finish释放gplan :
/*
* ReleaseGenericPlan: release a CachedPlanSource's generic plan, if any.
*/
static void
ReleaseGenericPlan(CachedPlanSource *plansource)
{
/* Be paranoid about the possibility that ReleaseCachedPlan fails */
if (plansource->gplan)
{
CachedPlan *plan = plansource->gplan;
Assert(plan->magic == CACHEDPLAN_MAGIC);
plansource->gplan = NULL;
ReleaseCachedPlan(plan, false);
}
}
3. 最后补充以下, 数据分布非常倾斜, 就像本例id=1的有50000条, id=2的只有1条的情况. 当传入id=2的值时可以考虑强制custom plan.
当然如果经常要这样做, 不推荐使用prepare statement. 因为generic plan是通过出现最频繁的值产生的执行计划, 至少从gdb跟踪出来的是这样的.
【参考】
1. http://www.postgresql.org/docs/9.2/static/spi-spi-prepare.html
2. http://www.postgresql.org/docs/9.2/static/spi-spi-execute-plan.html
3. http://www.postgresql.org/docs/9.2/static/plpgsql-implementation.html
4. http://www.postgresql.org/docs/9.2/static/xfunc-volatility.html
5. http://www.postgresql.org/docs/9.2/static/sql-prepare.html
6. http://www.postgresql.org/docs/9.2/static/sql-execute.html
7. http://www.postgresql.org/docs/9.2/static/sql-deallocate.html
8. http://www.postgresql.org/docs/9.2/static/spi-spi-keepplan.html
9. http://www.postgresql.org/docs/9.2/static/spi-spi-saveplan.html
10. http://blog.163.com/digoal@126/blog/static/16387704020121015265329/
11. http://blog.163.com/digoal@126/blog/static/16387704020124219333824/
12. http://blog.163.com/digoal@126/blog/static/16387704020111168855258/
13. http://blog.163.com/digoal@126/blog/static/16387704020109286417600/
14. src/backend/executor/spi.c
15. src/include/executor/spi_priv.h
16. src/backend/utils/cache/plancache.c
17. src/backend/commands/prepare.c
18. src/include/utils/plancache.h
00024 /*
00025 * CachedPlanSource (which might better have been called CachedQuery)
00026 * represents a SQL query that we expect to use multiple times. It stores
00027 * the query source text, the raw parse tree, and the analyzed-and-rewritten
00028 * query tree, as well as adjunct data. Cache invalidation can happen as a
00029 * result of DDL affecting objects used by the query. In that case we discard
00030 * the analyzed-and-rewritten query tree, and rebuild it when next needed.
00031 *
00032 * An actual execution plan, represented by CachedPlan, is derived from the
00033 * CachedPlanSource when we need to execute the query. The plan could be
00034 * either generic (usable with any set of plan parameters) or custom (for a
00035 * specific set of parameters). plancache.c contains the logic that decides
00036 * which way to do it for any particular execution. If we are using a generic
00037 * cached plan then it is meant to be re-used across multiple executions, so
00038 * callers must always treat CachedPlans as read-only.
00039 *
00040 * Once successfully built and "saved", CachedPlanSources typically live
00041 * for the life of the backend, although they can be dropped explicitly.
00042 * CachedPlans are reference-counted and go away automatically when the last
00043 * reference is dropped. A CachedPlan can outlive the CachedPlanSource it
00044 * was created from.
00045 *
00046 * An "unsaved" CachedPlanSource can be used for generating plans, but it
00047 * lives in transient storage and will not be updated in response to sinval
00048 * events.
00049 *
00050 * CachedPlans made from saved CachedPlanSources are likewise in permanent
00051 * storage, so to avoid memory leaks, the reference-counted references to them
00052 * must be held in permanent data structures or ResourceOwners. CachedPlans
00053 * made from unsaved CachedPlanSources are in children of the caller's
00054 * memory context, so references to them should not be longer-lived than
00055 * that context. (Reference counting is somewhat pro forma in that case,
00056 * though it may be useful if the CachedPlan can be discarded early.)
00057 *
00058 * A CachedPlanSource has two associated memory contexts: one that holds the
00059 * struct itself, the query source text and the raw parse tree, and another
00060 * context that holds the rewritten query tree and associated data. This
00061 * allows the query tree to be discarded easily when it is invalidated.
00062 *
00063 * Note: the string referenced by commandTag is not subsidiary storage;
00064 * it is assumed to be a compile-time-constant string. As with portals,
00065 * commandTag shall be NULL if and only if the original query string (before
00066 * rewriting) was an empty string.
00067 */
00068 typedef struct CachedPlanSource
00069 {
00070 int magic; /* should equal CACHEDPLANSOURCE_MAGIC */
00071 Node *raw_parse_tree; /* output of raw_parser() */
00072 char *query_string; /* source text of query */
00073 const char *commandTag; /* command tag (a constant!), or NULL */
00074 Oid *param_types; /* array of parameter type OIDs, or NULL */
00075 int num_params; /* length of param_types array */
00076 ParserSetupHook parserSetup; /* alternative parameter spec method */
00077 void *parserSetupArg;
00078 int cursor_options; /* cursor options used for planning */
00079 bool fixed_result; /* disallow change in result tupdesc? */
00080 TupleDesc resultDesc; /* result type; NULL = doesn't return tuples */
00081 struct OverrideSearchPath *search_path; /* saved search_path */
00082 MemoryContext context; /* memory context holding all above */
00083 /* These fields describe the current analyzed-and-rewritten query tree: */
00084 List *query_list; /* list of Query nodes, or NIL if not valid */
00085 List *relationOids; /* OIDs of relations the queries depend on */
00086 List *invalItems; /* other dependencies, as PlanInvalItems */
00087 MemoryContext query_context; /* context holding the above, or NULL */
00088 /* If we have a generic plan, this is a reference-counted link to it: */
00089 struct CachedPlan *gplan; /* generic plan, or NULL if not valid */
00090 /* Some state flags: */
00091 bool is_complete; /* has CompleteCachedPlan been done? */
00092 bool is_saved; /* has CachedPlanSource been "saved"? */
00093 bool is_valid; /* is the query_list currently valid? */
00094 int generation; /* increments each time we create a plan */
00095 /* If CachedPlanSource has been saved, it is a member of a global list */
00096 struct CachedPlanSource *next_saved; /* list link, if so */
00097 /* State kept to help decide whether to use custom or generic plans: */
00098 double generic_cost; /* cost of generic plan, or -1 if not known */
00099 double total_custom_cost; /* total cost of custom plans so far */
00100 int num_custom_plans; /* number of plans included in total */
00101 } CachedPlanSource;
00102 generic plan 存储在以下数据结构中:
00103 /*
00104 * CachedPlan represents an execution plan derived from a CachedPlanSource.
00105 * The reference count includes both the link from the parent CachedPlanSource
00106 * (if any), and any active plan executions, so the plan can be discarded
00107 * exactly when refcount goes to zero. Both the struct itself and the
00108 * subsidiary data live in the context denoted by the context field.
00109 * This makes it easy to free a no-longer-needed cached plan.
00110 */
00111 typedef struct CachedPlan
00112 {
00113 int magic; /* should equal CACHEDPLAN_MAGIC */
00114 List *stmt_list; /* list of statement nodes (PlannedStmts and
00115 * bare utility statements) */
00116 bool is_saved; /* is CachedPlan in a long-lived context? */
00117 bool is_valid; /* is the stmt_list currently valid? */
00118 TransactionId saved_xmin; /* if valid, replan when TransactionXmin
00119 * changes from this value */
00120 int generation; /* parent's generation number for this plan */
00121 int refcount; /* count of live references to this struct */
00122 MemoryContext context; /* context containing this CachedPlan */
00123 } CachedPlan;