PostgreSQL 统计信息pg_statistic格式及导入导出dump_stat - 兼容Oracle

背景

《PostgreSQL 规格评估 - 微观、宏观、精准多视角估算数据库性能(选型、做预算不求人)》

EXPLAIN是PG数据库用于输出SQL执行计划的语法，

1、生成的执行计划中包含COST一项。

如果校准了成本因子，COST可以和SQL实际执行时间对其。因子校对的方法如下，实际上每一种硬件，我们只需要校对一遍即可。

《优化器成本因子校对(disk,ssd,memory IO开销精算) - PostgreSQL real seq_page_cost & random_page_cost in disks,ssd,memory》

《优化器成本因子校对 - PostgreSQL explain cost constants alignment to timestamp》

校对因子如下：

#seq_page_cost = 1.0                    # measured on an arbitrary scale
random_page_cost = 1.2                  # same scale as above
#cpu_tuple_cost = 0.01                  # same scale as above
#cpu_index_tuple_cost = 0.005           # same scale as above
#cpu_operator_cost = 0.0025             # same scale as above    

# 以下不需要校对, 不考虑并行计算SQL
parallel_tuple_cost = 0.1               # same scale as above
parallel_setup_cost = 1000.0            # same scale as above
effective_cache_size = 10GB

2、评估COST还需要依赖统计信息柱状图：

涉及reltuples, relpages. 表示评估的记录数以及占用多少个数据块。注意源头的block_size可能和PG的不一致，占用多少个块需要转换一下。(show block_size可以查看数据块大小。)   

postgres=# \d pg_class
                     Table "pg_catalog.pg_class"
       Column        |     Type     | Collation | Nullable | Default
---------------------+--------------+-----------+----------+---------
 relname             | name         |           | not null |   -- 对象名
 relnamespace        | oid          |           | not null |   -- 对象所属的schema, 对应pg_namespace.oid
 reltype             | oid          |           | not null |
 reloftype           | oid          |           | not null |
 relowner            | oid          |           | not null |
 relam               | oid          |           | not null |
 relfilenode         | oid          |           | not null |
 reltablespace       | oid          |           | not null |
 relpages            | integer      |           | not null |   -- 评估的页数(单位为block_size)
 reltuples           | real         |           | not null |   -- 评估的记录数
 relallvisible       | integer      |           | not null |
 reltoastrelid       | oid          |           | not null |
 relhasindex         | boolean      |           | not null |
 relisshared         | boolean      |           | not null |
 relpersistence      | "char"       |           | not null |
 relkind             | "char"       |           | not null |
 relnatts            | smallint     |           | not null |
 relchecks           | smallint     |           | not null |
 relhasoids          | boolean      |           | not null |
 relhaspkey          | boolean      |           | not null |
 relhasrules         | boolean      |           | not null |
 relhastriggers      | boolean      |           | not null |
 relhassubclass      | boolean      |           | not null |
 relrowsecurity      | boolean      |           | not null |
 relforcerowsecurity | boolean      |           | not null |
 relispopulated      | boolean      |           | not null |
 relreplident        | "char"       |           | not null |
 relispartition      | boolean      |           | not null |
 relfrozenxid        | xid          |           | not null |
 relminmxid          | xid          |           | not null |
 relacl              | aclitem[]    |           |          |
 reloptions          | text[]       |           |          |
 relpartbound        | pg_node_tree |           |          |
Indexes:
    "pg_class_oid_index" UNIQUE, btree (oid)
    "pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
    "pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)    

涉及   空值比例、平均列宽、唯一值比例或个数、高频值以及频率、柱状图分布、存储相关性、多值列（高频元素及比例、元素柱状图分布）。    

-- 这个是视图：  

postgres=# \d pg_stats
                     View "pg_catalog.pg_stats"
         Column         |   Type   | Collation | Nullable | Default
------------------------+----------+-----------+----------+---------
 schemaname             | name     |           |          |   -- 对象所属的schema
 tablename              | name     |           |          |   -- 对象名
 attname                | name     |           |          |   -- 列名
 inherited              | boolean  |           |          |   -- 是否为继承表的统计信息（false时表示当前表的统计信息，true时表示包含所有继承表的统计信息）
 null_frac              | real     |           |          |   -- 该列空值比例
 avg_width              | integer  |           |          |   -- 该列平均长度
 n_distinct             | real     |           |          |   -- 该列唯一值个数（-1表示唯一，小于1表示占比，大于等于1表示实际的唯一值个数）
 most_common_vals       | anyarray |           |          |   -- 该列高频词
 most_common_freqs      | real[]   |           |          |   -- 该列高频词对应的出现频率
 histogram_bounds       | anyarray |           |          |   -- 该列柱状图（表示隔出的每个BUCKET的记录数均等）
 correlation            | real     |           |          |   -- 该列存储相关性（-1到1的区间），绝对值越小，存储越离散。小于0表示反向相关，大于0表示正向相关
 most_common_elems      | anyarray |           |          |   -- 该列为多值类型（数组）时，多值元素的高频词
 most_common_elem_freqs | real[]   |           |          |   -- 多值元素高频词的出现频率
 elem_count_histogram   | real[]   |           |          |   -- 多值元素的柱状图中，每个区间的非空唯一元素个数  

-- 这个是实际存储的数据(也就是要导入的部分)：
-- https://www.postgresql.org/docs/10/static/catalog-pg-statistic.html  

postgres=# \d pg_statistic
             Table "pg_catalog.pg_statistic"
   Column    |   Type   | Collation | Nullable | Default
-------------+----------+-----------+----------+---------
 starelid    | oid      |           | not null |   -- 对象OID，对应pg_class.oid
 staattnum   | smallint |           | not null |   -- 该列在表中的位置序号，对应pg_attribute.attnum
 stainherit  | boolean  |           | not null |   -- 是否为继承表的统计信息（false时表示当前表的统计信息，true时表示包含所有继承表的统计信息）
 stanullfrac | real     |           | not null |   -- 空值比例
 stawidth    | integer  |           | not null |   -- 平均长度
 stadistinct | real     |           | not null |   -- 唯一值个数、比例
 stakind1    | smallint |           | not null |   -- 表示第1个SLOT的统计信息分类编号
 stakind2    | smallint |           | not null |   -- 表示第2个SLOT的统计信息分类编号
 stakind3    | smallint |           | not null |   -- 表示第3个SLOT的统计信息分类编号
 stakind4    | smallint |           | not null |   -- 表示第4个SLOT的统计信息分类编号
 stakind5    | smallint |           | not null |   -- 表示第5个SLOT的统计信息分类编号
 staop1      | oid      |           | not null |   -- 表示第1个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界，需要用到 "<" 这个操作符)
 staop2      | oid      |           | not null |   -- 表示第2个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界，需要用到 "<" 这个操作符)
 staop3      | oid      |           | not null |   -- 表示第3个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界，需要用到 "<" 这个操作符)
 staop4      | oid      |           | not null |   -- 表示第4个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界，需要用到 "<" 这个操作符)
 staop5      | oid      |           | not null |   -- 表示第5个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界，需要用到 "<" 这个操作符)
 stanumbers1 | real[]   |           |          |   -- 表示第1个SLOT的以numeric[]为结果的统计信息，NULL说明这个SLOT分类没有numeric的统计信息。
 stanumbers2 | real[]   |           |          |   -- 表示第2个SLOT的以numeric[]为结果的统计信息，NULL说明这个SLOT分类没有numeric的统计信息。
 stanumbers3 | real[]   |           |          |   -- 表示第3个SLOT的以numeric[]为结果的统计信息，NULL说明这个SLOT分类没有numeric的统计信息。
 stanumbers4 | real[]   |           |          |   -- 表示第4个SLOT的以numeric[]为结果的统计信息，NULL说明这个SLOT分类没有numeric的统计信息。
 stanumbers5 | real[]   |           |          |   -- 表示第5个SLOT的以numeric[]为结果的统计信息，NULL说明这个SLOT分类没有numeric的统计信息。
 stavalues1  | anyarray |           |          |   -- 表示第1个SLOT的以anyarray[]为结果的统计信息，NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型，或者列本身的类型。
 stavalues2  | anyarray |           |          |   -- 表示第2个SLOT的以anyarray[]为结果的统计信息，NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型，或者列本身的类型。
 stavalues3  | anyarray |           |          |   -- 表示第3个SLOT的以anyarray[]为结果的统计信息，NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型，或者列本身的类型。
 stavalues4  | anyarray |           |          |   -- 表示第4个SLOT的以anyarray[]为结果的统计信息，NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型，或者列本身的类型。
 stavalues5  | anyarray |           |          |   -- 表示第5个SLOT的以anyarray[]为结果的统计信息，NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型，或者列本身的类型。
Indexes:
    "pg_statistic_relid_att_inh_index" UNIQUE, btree (starelid, staattnum, stainherit)

statkind的定义

src/include/catalog/pg_statistic.h

/*
 * Currently, five statistical slot "kinds" are defined by core PostgreSQL,
 * as documented below.  Additional "kinds" will probably appear in
 * future to help cope with non-scalar datatypes.  Also, custom data types
 * can define their own "kind" codes by mutual agreement between a custom
 * typanalyze routine and the selectivity estimation functions of the type's
 * operators.
 *
 * Code reading the pg_statistic relation should not assume that a particular
 * data "kind" will appear in any particular slot.  Instead, search the
 * stakind fields to see if the desired data is available.  (The standard
 * function get_attstatsslot() may be used for this.)
 */  

/*
 * The present allocation of "kind" codes is:
 *
 *      1-99:           reserved for assignment by the core PostgreSQL project
 *                              (values in this range will be documented in this file)
 *      100-199:        reserved for assignment by the PostGIS project
 *                              (values to be documented in PostGIS documentation)
 *      200-299:        reserved for assignment by the ESRI ST_Geometry project
 *                              (values to be documented in ESRI ST_Geometry documentation)
 *      300-9999:       reserved for future public assignments
 *
 * For private use you may choose a "kind" code at random in the range
 * 10000-30000.  However, for code that is to be widely disseminated it is
 * better to obtain a publicly defined "kind" code by request from the
 * PostgreSQL Global Development Group.
 */  

/*
 * In a "most common values" slot, staop is the OID of the "=" operator
 * used to decide whether values are the same or not.  stavalues contains
 * the K most common non-null values appearing in the column, and stanumbers
 * contains their frequencies (fractions of total row count).  The values
 * shall be ordered in decreasing frequency.  Note that since the arrays are
 * variable-size, K may be chosen by the statistics collector.  Values should
 * not appear in MCV unless they have been observed to occur more than once;
 * a unique column will have no MCV slot.
 */
#define STATISTIC_KIND_MCV      1  

/*
 * A "histogram" slot describes the distribution of scalar data.  staop is
 * the OID of the "<" operator that describes the sort ordering.  (In theory,
 * more than one histogram could appear, if a datatype has more than one
 * useful sort operator.)  stavalues contains M (>=2) non-null values that
 * divide the non-null column data values into M-1 bins of approximately equal
 * population.  The first stavalues item is the MIN and the last is the MAX.
 * stanumbers is not used and should be NULL.  IMPORTANT POINT: if an MCV
 * slot is also provided, then the histogram describes the data distribution
 * *after removing the values listed in MCV* (thus, it's a "compressed
 * histogram" in the technical parlance).  This allows a more accurate
 * representation of the distribution of a column with some very-common
 * values.  In a column with only a few distinct values, it's possible that
 * the MCV list describes the entire data population; in this case the
 * histogram reduces to empty and should be omitted.
 */
#define STATISTIC_KIND_HISTOGRAM  2  

/*
 * A "correlation" slot describes the correlation between the physical order
 * of table tuples and the ordering of data values of this column, as seen
 * by the "<" operator identified by staop.  (As with the histogram, more
 * than one entry could theoretically appear.)  stavalues is not used and
 * should be NULL.  stanumbers contains a single entry, the correlation
 * coefficient between the sequence of data values and the sequence of
 * their actual tuple positions.  The coefficient ranges from +1 to -1.
 */
#define STATISTIC_KIND_CORRELATION      3  

/*
 * A "most common elements" slot is similar to a "most common values" slot,
 * except that it stores the most common non-null *elements* of the column
 * values.  This is useful when the column datatype is an array or some other
 * type with identifiable elements (for instance, tsvector).  staop contains
 * the equality operator appropriate to the element type.  stavalues contains
 * the most common element values, and stanumbers their frequencies.  Unlike
 * MCV slots, frequencies are measured as the fraction of non-null rows the
 * element value appears in, not the frequency of all rows.  Also unlike
 * MCV slots, the values are sorted into the element type's default order
 * (to support binary search for a particular value).  Since this puts the
 * minimum and maximum frequencies at unpredictable spots in stanumbers,
 * there are two extra members of stanumbers, holding copies of the minimum
 * and maximum frequencies.  Optionally, there can be a third extra member,
 * which holds the frequency of null elements (expressed in the same terms:
 * the fraction of non-null rows that contain at least one null element).  If
 * this member is omitted, the column is presumed to contain no null elements.
 *
 * Note: in current usage for tsvector columns, the stavalues elements are of
 * type text, even though their representation within tsvector is not
 * exactly text.
 */
#define STATISTIC_KIND_MCELEM  4  

/*
 * A "distinct elements count histogram" slot describes the distribution of
 * the number of distinct element values present in each row of an array-type
 * column.  Only non-null rows are considered, and only non-null elements.
 * staop contains the equality operator appropriate to the element type.
 * stavalues is not used and should be NULL.  The last member of stanumbers is
 * the average count of distinct element values over all non-null rows.  The
 * preceding M (>=2) members form a histogram that divides the population of
 * distinct-elements counts into M-1 bins of approximately equal population.
 * The first of these is the minimum observed count, and the last the maximum.
 */
#define STATISTIC_KIND_DECHIST  5  

/*
 * A "length histogram" slot describes the distribution of range lengths in
 * rows of a range-type column. stanumbers contains a single entry, the
 * fraction of empty ranges. stavalues is a histogram of non-empty lengths, in
 * a format similar to STATISTIC_KIND_HISTOGRAM: it contains M (>=2) range
 * values that divide the column data values into M-1 bins of approximately
 * equal population. The lengths are stored as float8s, as measured by the
 * range type's subdiff function. Only non-null rows are considered.
 */
#define STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM  6  

/*
 * A "bounds histogram" slot is similar to STATISTIC_KIND_HISTOGRAM, but for
 * a range-type column.  stavalues contains M (>=2) range values that divide
 * the column data values into M-1 bins of approximately equal population.
 * Unlike a regular scalar histogram, this is actually two histograms combined
 * into a single array, with the lower bounds of each value forming a
 * histogram of lower bounds, and the upper bounds a histogram of upper
 * bounds.  Only non-NULL, non-empty ranges are included.
 */
#define STATISTIC_KIND_BOUNDS_HISTOGRAM  7

那么这些统计信息如何导入导出呢？

导出导入统计信息

https://postgrespro.com/docs/postgresproee/9.6/dump-stat.html

dump_stat这个插件是PostgreSQL pro推出的兼容9.6版本的导出统计信息的插件。

代码如下：

https://github.com/postgrespro/postgrespro/tree/PGPRO9_6

https://github.com/postgrespro/postgrespro/tree/PGPRO9_6/contrib/dump_stat

导出

通过dump_stat导出（导出的结构已经是SQL形式），然后通过SQL导入。

$ psql test -A
test=# \t
test=# \o dump_stat.sql
test=# select dump_statistic();

pg_statistic的每一条记录，产生一条如下SQL：

WITH upsert as (
  UPDATE pg_catalog.pg_statistic SET column_name = expression [, ...]
  WHERE to_schema_qualified_relation(starelid) = t_relname
    AND to_attname(t_relname, staattnum) = t_attname
    AND to_atttype(t_relname, staattnum) = t_atttype
    AND stainherit = t_stainherit
  RETURNING *)
ins as (
  SELECT expression [, ...]
  WHERE NOT EXISTS (SELECT * FROM upsert)
    AND to_attnum(t_relname, t_attname) IS NOT NULL
    AND to_atttype(t_relname, t_attname) = t_atttype)
INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;    

where expression can be one of:    

array_in(array_text, type_name::regtype::oid, -1)
value::type_name

stat导入的实际例子

(表示public.test表的info列的统计信息，如果存在则更新，不存在则插入。)

WITH
  upsert as (
    UPDATE pg_catalog.pg_statistic
    SET
      stanullfrac = 0, stawidth = 4, stadistinct = -1, stakind1 = 2, stakind2 = 3, stakind3 = 0, stakind4 = 0, stakind5 = 0, staop1 = 'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator, staop2 = 'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator, staop3 = '0'::regoperator, staop4 = '0'::regoperator, staop5 = '0'::regoperator, stanumbers1 = NULL::real[], stanumbers2 = '{-0.5}'::real[], stanumbers3 = NULL::real[], stanumbers4 = NULL::real[], stanumbers5 = NULL::real[], stavalues1 = array_in('{abc,cde,test}', 'pg_catalog.text'::regtype, -1)::anyarray, stavalues2 = NULL::anyarray, stavalues3 = NULL::anyarray, stavalues4 = NULL::anyarray, stavalues5 = NULL::anyarray
    WHERE to_schema_qualified_relation(starelid) = 'public.test' AND to_attname('public.test', staattnum) = 'info' AND to_atttype('public.test', staattnum) = 'pg_catalog.text' AND stainherit = false
    RETURNING *
  ),
  ins as (
    SELECT
      'public.test'::regclass,
      to_attnum('public.test', 'info'),
      'false'::boolean,
      0::real,
      4::integer,
      -1::real,
      2,  -- stakind=2 表示柱状图
      3,  -- stakind=3 表示相关性
      0,
      0,
      0,
      'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator,
      'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator,
      '0'::regoperator,
      '0'::regoperator,
      '0'::regoperator,
      NULL::real[],
      '{-0.5}'::real[],
      NULL::real[],
      NULL::real[],
      NULL::real[],
      array_in('{abc,cde,test}', 'pg_catalog.text'::regtype, -1)::anyarray,
      NULL::anyarray,
      NULL::anyarray,
      NULL::anyarray,
      NULL::anyarray
    WHERE NOT EXISTS (SELECT * FROM upsert) AND to_attnum('public.test', 'info') IS NOT NULL AND to_atttype('public.test', 'info') = 'pg_catalog.text'
  )
INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;

导入

1、修改postgresql.conf，允许修改系统表，重启数据库生效配置

vi postgresql.conf  

allow_system_table_mods=on  

pg_ctl restart -m fast

2、导入统计信息

-- 1 pg_class    

update pg_class set reltuples=?, relpages=? where oid=?;    

-- 2 pg_statistic    

WITH upsert as (
  UPDATE pg_catalog.pg_statistic SET column_name = expression [, ...]
  WHERE to_schema_qualified_relation(starelid) = t_relname
    AND to_attname(t_relname, staattnum) = t_attname
    AND to_atttype(t_relname, staattnum) = t_atttype
    AND stainherit = t_stainherit
  RETURNING *)
ins as (
  SELECT expression [, ...]
  WHERE NOT EXISTS (SELECT * FROM upsert)
    AND to_attnum(t_relname, t_attname) IS NOT NULL
    AND to_atttype(t_relname, t_attname) = t_atttype)
INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;    

where expression can be one of:    

array_in(array_text, type_name::regtype::oid, -1)
value::type_name

3、导入完成后，将allow_system_table_mods设置为off，重启数据库。

dump_statistic代码

CREATE FUNCTION dump_statistic(relid oid) RETURNS SETOF TEXT AS $$
        DECLARE
                result  text;  

                cmd             text;           -- main query
                in_args text;           -- args for insert
                up_args text;           -- args for upsert  

                fstaop  text := '%s::regoperator';
                arr_in  text := 'array_in(%s, %s::regtype, -1)::anyarray';  

                stacols text[] = ARRAY['stakind', 'staop',
                                                           'stanumbers', 'stavalues' ];  

                r               record;
                i               int;
                j               text;
                ncols   int := 26;      -- number of columns in pg_statistic  

                stanum  text[];         -- stanumbers{1, 5}
                staval  text[];         -- stavalues{1, 5}
                staop   text[];         -- staop{1, 5}  

                relname text;           -- quoted relation name
                attname text;           -- quoted attribute name
                atttype text;           -- quoted attribute type  

        BEGIN
                for r in
                                select * from pg_catalog.pg_statistic
                                where starelid = relid
                                        and get_namespace(starelid) != to_namespace('information_schema')
                                        and get_namespace(starelid) != to_namespace('pg_catalog') loop  

                        relname := to_schema_qualified_relation(r.starelid);
                        attname := quote_literal(to_attname(relname, r.staattnum));
                        atttype := quote_literal(to_atttype(relname, r.staattnum));
                        relname := quote_literal(relname); -- redefine relname  

                        in_args := '';
                        up_args = 'stanullfrac = %s, stawidth = %s, stadistinct = %s, ';  

                        cmd := 'WITH upsert as ( ' ||
                                                'UPDATE pg_catalog.pg_statistic SET %s ' ||
                                                'WHERE to_schema_qualified_relation(starelid) = ' || relname || ' '
                                                        'AND to_attname(' || relname || ', staattnum) = ' || attname || ' '
                                                        'AND to_atttype(' || relname || ', staattnum) = ' || atttype || ' '
                                                        'AND stainherit = ' || r.stainherit || ' ' ||
                                                'RETURNING *), ' ||
                                   'ins as ( ' ||
                                                'SELECT %s ' ||
                                                'WHERE NOT EXISTS (SELECT * FROM upsert) ' ||
                                                        'AND to_attnum(' || relname || ', ' || attname || ') IS NOT NULL '
                                                        'AND to_atttype(' || relname || ', ' || attname || ') = ' || atttype || ') '
                                   'INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;';  

                        for i in 1..ncols loop
                                in_args := in_args || '%s';  

                                if i != ncols then
                                        in_args := in_args || ', ';
                                end if;
                        end loop;  

                        for j in 1..4 loop
                                for i in 1..5 loop
                                        up_args := up_args || format('%s%s = %%s', stacols[j], i);  

                                        if i * j != 20 then
                                                up_args := up_args || ', ';
                                        end if;
                                end loop;
                        end loop;  

                        cmd := format(cmd, up_args, in_args);   --prepare template for main query  

                        staop := array[format(fstaop, quote_literal(to_schema_qualified_operator(r.staop1))),
                                                   format(fstaop, quote_literal(to_schema_qualified_operator(r.staop2))),
                                                   format(fstaop, quote_literal(to_schema_qualified_operator(r.staop3))),
                                                   format(fstaop, quote_literal(to_schema_qualified_operator(r.staop4))),
                                                   format(fstaop, quote_literal(to_schema_qualified_operator(r.staop5)))];  

                        stanum := array[r.stanumbers1::text,
                                                        r.stanumbers2::text,
                                                        r.stanumbers3::text,
                                                        r.stanumbers4::text,
                                                        r.stanumbers5::text];  

                        for i in 1..5 loop
                                if stanum[i] is null then
                                        stanum[i] := 'NULL::real[]';
                                else
                                        stanum[i] := '''' || stanum[i] || '''::real[]';
                                end if;
                        end loop;  

                        if r.stavalues1 is not null then
                                staval[1] := format(arr_in, quote_literal(r.stavalues1),
                                                                        quote_literal(
                                                                                to_schema_qualified_type(
                                                                                        anyarray_elemtype(r.stavalues1))));
                        else
                                staval[1] := 'NULL::anyarray';
                        end if;  

                        if r.stavalues2 is not null then
                                staval[2] := format(arr_in, quote_literal(r.stavalues2),
                                                                        quote_literal(
                                                                                to_schema_qualified_type(
                                                                                        anyarray_elemtype(r.stavalues2))));
                        else
                                staval[2] := 'NULL::anyarray';
                        end if;  

                        if r.stavalues3 is not null then
                                staval[3] := format(arr_in, quote_literal(r.stavalues3),
                                                                        quote_literal(
                                                                                to_schema_qualified_type(
                                                                                        anyarray_elemtype(r.stavalues3))));
                        else
                                staval[3] := 'NULL::anyarray';
                        end if;  

                        if r.stavalues4 is not null then
                                staval[4] := format(arr_in, quote_literal(r.stavalues4),
                                                                        quote_literal(
                                                                                to_schema_qualified_type(
                                                                                        anyarray_elemtype(r.stavalues4))));
                        else
                                staval[4] := 'NULL::anyarray';
                        end if;  

                        if r.stavalues5 is not null then
                                staval[5] := format(arr_in, quote_literal(r.stavalues5),
                                                                        quote_literal(
                                                                                to_schema_qualified_type(
                                                                                        anyarray_elemtype(r.stavalues5))));
                        else
                                staval[5] := 'NULL::anyarray';
                        end if;  

                        --DEBUG
                        --staop := array['{arr}', '{arr}', '{arr}', '{arr}', '{arr}'];
                        --stanum := array['{num}', '{num}', '{num}', '{num}', '{num}'];
                        --staval := array['{val}', '{val}', '{val}', '{val}', '{val}'];  

                        result := format(cmd,
                                                         r.stanullfrac,
                                                         r.stawidth,
                                                         r.stadistinct,
                                                         -- stakind
                                                         r.stakind1, r.stakind2, r.stakind3, r.stakind4, r.stakind5,
                                                         -- staop
                                                         staop[1], staop[2], staop[3], staop[4], staop[5],
                                                         -- stanumbers
                                                         stanum[1], stanum[2], stanum[3], stanum[4], stanum[5],
                                                         -- stavalues
                                                         staval[1], staval[2], staval[3], staval[4], staval[5],  

                                                         -- first 6 columns
                                                         format('%s::regclass', relname),
                                                         format('to_attnum(%s, %s)', relname, attname),
                                                         '''' || r.stainherit || '''::boolean',
                                                         r.stanullfrac || '::real',
                                                         r.stawidth || '::integer',
                                                         r.stadistinct || '::real',
                                                         -- stakind
                                                         r.stakind1, r.stakind2, r.stakind3, r.stakind4, r.stakind5,
                                                         -- staop
                                                         staop[1], staop[2], staop[3], staop[4], staop[5],
                                                         -- stanumbers
                                                         stanum[1], stanum[2], stanum[3], stanum[4], stanum[5],
                                                         -- stavalues
                                                         staval[1], staval[2], staval[3], staval[4], staval[5]);  

                        return next result;
                end loop;  

                return;
        END;
$$ LANGUAGE plpgsql;

我们甚至可以将Oracle数据库的统计信息，平移到PG数据库，对齐需要的元素即可：

记录数、占用多少个数据块。每列的空值比例、平均列宽、唯一值比例或个数、高频值以及频率、柱状图分布、存储相关性、多值列（高频元素及比例、元素柱状图分布）。

好处：在迁移ORACLE数据时，可以关闭autovacuumm（提高导入速度），通过这种方法来导入统计信息。（只要元素对应即可，当然有些元素可能是ORACLE中不采集的，比如多值列的统计信息）。

参考

https://github.com/postgrespro/postgrespro/tree/PGPRO9_6/contrib/dump_stat

时间： 2025-01-08 02:57:58

PostgreSQL 统计信息pg_statistic格式及导入导出dump_stat - 兼容Oracle的相关文章

PostgreSQL 统计信息混淆之处(scan,read,fetch,hit)源码解读

PostgreSQL 几个统计信息的解释难以理解,所以本文花一些时间从源码的角度来解释一下.让大家对这几个容易误解的统计值有更好的理解. 比较难理解的几个统计值为:pg_stat_all_indexes 的 idx_scan idx_tup_read idx_tup_fetch pg_statio_all_indexes 的 idx_blks_read idx_blks_hit pg_stat_all_tables 的 seq_scan seq_tup_read idx_scan idx_tup

oracle数据泵导入分区表统计信息报错（三）

前不久检查了一篇以往的BLOG,意外的发现这个bug居然被我忘记了,时隔一年继续解决这个问题. 根据上一篇文章所介绍的分析过程,基本上可以确认和这几张分区表的统计信息本身有关. 由于当前数据库是从920环境EXP导出,IMP导入到同版本的中间数据库,最终通过数据泵IMPDP导入到当前数据库的.而分区表由于无法解决表空间的转换问题,因此在中间数据库手工创建,在IMP导入的时候指定了IGNORE=Y参数. 而现在恰好问题出在这个用户的所有分区表上,难道问题和迁移的过程有关系.检查了当时的脚本,没有发

[20160713]impdp与统计信息导入.txt

[20160713]impdp与统计信息导入.txt --许多做迁移使用导入imdp加入EXCLUDE=STATISTICS参数,避免统计信息导入.具体原因我觉得可能人为重新组织数据后,统计信息不准确. --希望系统自己重新分析,实际上加入这个参数还导致另外的一种情况出现,测试看看. 1.环境: SCOTT@book> @ &r/ver1 PORT_STRING VERSION BANNER ------------------- --------------

Oracle的统计信息简介

1.什么是统计信息统计信息主要是描述数据库中表,索引的大小,规模,数据分布状况等的一类信息.比如,表的行数,块数,平均每行的大小,索引的leaf blocks,索引字段的行数,不同值的大小等,都属于统计信息.CBO正是根据这些统计信息数据,计算出不同访问路径下,不同join 方式下,各种计划的成本,最后选择出成本最小的计划. 统计信息是存放在数据字段表中的,如tab$.一般我们从数据字段视图中察看统计信息状况,如DBA_TABLES,DBA_INDEXES,DBA_TAB_COL_STATIS

Oracle数据导入导出imp/exp（转）

在oracle安装目录下有EXP.EXE与IMP.EXE这2个文件,他们分别被用来执行数据库的导入导出.所以Oracle数据导入导出imp/exp就相当与oracle数据还原与备份. 一.Oracle数据导出exp 1.Exp参数详解(红色字体代表比较重要的参数): (1) USERID 运行导出命令的帐号的用户名/口令 (2) BUFFEER 用来取数据行的缓冲区的大小 (3) FILE 导出转储文件的名字 (4) COMPRESS 导出是否应该压缩有碎片的段成一个范围,这将会影

oracle收集统计信息

什么是统计信息统计信息主要是描述数据库中表,索引的大小,规模,数据分布状况等的一类信息.比如,表的行数,块数,平均每行的大小,索引的leaf blocks,索引字段的行数,不同值的大小等,都属于统计信息.CBO正是根据这些统计信息数据,计算出不同访问路径下,不同join 方式下,各种计划的成本,最后选择出成本最小的计划. 统计信息是存放在数据字段表中的,如dba_tab_statistics 如何搜集统计信息统计信息搜集也是有多种方法,推荐大家使用DBMS_STATS 表来进行统计信息搜集及

Oracle 还原历史统计信息

统计信息是个非常有用的东东,没有它,SQL优化器就好比巧妇难为无米之炊!良好高效的SQL执行计划依赖于真实的统计信息.然而在有些情况下,比如对比生产环境与测试环境执行计划,需要使用生产环境的统计信息.而有时候呢则需要还原Oracle历史统计信息.本文基于后者即如何还原历史统计信息来展开,同时描述了11g缺省情况下对于统计信息的调度. 有关统计信息的导入导出可以参考: dbms_stats 导入导出 schema 级别统计信息 d

[20160904]表统计信息lock.txt

[20160904]表统计信息lock.txt 晚上看链接:https://blogs.oracle.com/Database4CN/entry/%E8%AF%8A%E6%96%AD%E7%BB%9F%E8%AE%A1%E4%BF%A1%E6%81%AF%E4%B8%8D%E6%94%B6%E9%9B%86%E5%8E%9F%E5%9B%A0 提到如果导入使用ROWS=n,会导致导入的表lock,测试看看: 官方有如下解释: If ROWS=n, then statistics for all

Oracle 12c数据库优化器统计信息收集的最佳实践

Oracle 12c数据库优化器统计信息收集的最佳实践转载自沃趣科技(ID:woqutech) 作者刘金龙(译) 原文链接 http://www.oracle.com/technetwork/database/bi-datawarehousing/twp-bp-for-stats-gather-12c-1967354.pdf 导语 Oracle优化器会为SQL语句产生所有可能的访问路径(执行计划),然后从中选择一条COST值最低的执行路径,这个cost值是