Greenplum 类型一致性使用规范 - 索引条件、JOIN的类型一致性限制

标签

PostgreSQL , Greenplum , 类型一致 , 索引 , inner转换 , join


背景

在查询时,有很多用户会犯浑,发现建立了索引,但是查询偏偏不走索引。

怎么不走索引啊?

这里做容易混淆的是类型的差异,例如字段类型为字符串,但是输入的是INT类型,这样就可能不走索引。(除非创建了自动的CAST,自动进行类型转换)

查询的输入类型与索引的类型一致是非常有必要的。

例子

1、建表(使用变长、定长字符串类型),写入测试数据

create table tbl3(id int, c1 text, c2 varchar(64), c3 char(64));  

postgres=# insert into tbl3 select id, md5, md5, md5 from (select id, md5(random()::text) as md5 from generate_series(1,1000000) t(id)) t;
INSERT 0 1000000

2、创建索引

postgres=# create index idx_tbl3_1 on tbl3 (c1);
CREATE INDEX
postgres=# create index idx_tbl3_2 on tbl3 (c2);
CREATE INDEX
postgres=# create index idx_tbl3_3 on tbl3 (c3);
CREATE INDEX

3、结构如下

postgres=# \d+ tbl3
                         Table "public.tbl3"
 Column |         Type          | Modifiers | Storage  | Description
--------+-----------------------+-----------+----------+-------------
 id     | integer               |           | plain    |
 c1     | text                  |           | extended |
 c2     | character varying(64) |           | extended |
 c3     | character(64)         |           | extended |
Indexes:
    "idx_tbl3_1" btree (c1)
    "idx_tbl3_2" btree (c2)
    "idx_tbl3_3" btree (c3)
Has OIDs: no
Distributed by: (id)

4、走索引,强制转换为bpchar。

postgres=# explain analyze select * from tbl3 where c3='abc';
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..201.23 rows=1 width=135)
   Rows out:  0 rows at destination with 3.081 ms to end, start offset by 0.294 ms.
   ->  Index Scan using idx_tbl3_3 on tbl3  (cost=0.00..201.23 rows=1 width=135)
         Index Cond: c3 = 'abc'::bpchar
         Rows out:  0 rows (seg0) with 0.138 ms to end, start offset by 3.211 ms.
 Slice statistics:
   (slice0)    Executor memory: 147K bytes.
   (slice1)    Executor memory: 145K bytes avg x 3 workers, 145K bytes max (seg0).
 Statement statistics:
   Memory used: 128000K bytes
 Total runtime: 3.484 ms
(11 rows)

5、不走索引,因为输入的为text变长,与char定长不一致。

postgres=# explain analyze select * from tbl3 where c3='abc'::text;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..20032.60 rows=1001 width=135)
   Rows out:  0 rows at destination with 305 ms to end, start offset by 0.335 ms.
   ->  Seq Scan on tbl3  (cost=0.00..20032.60 rows=334 width=135)
         Filter: c3::text = 'abc'::text
         Rows out:  0 rows (seg0) with 302 ms to end, start offset by 4.023 ms.
 Slice statistics:
   (slice0)    Executor memory: 147K bytes.
   (slice1)    Executor memory: 195K bytes avg x 3 workers, 195K bytes max (seg0).
 Statement statistics:
   Memory used: 128000K bytes
 Total runtime: 305.985 ms
(11 rows)

6、走索引,输入的类型为变长类型,字段也是变长类型,与TEXT兼容。

postgres=# explain analyze select * from tbl3 where c2='abc'::text;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..200.86 rows=1 width=135)
   Rows out:  0 rows at destination with 2.744 ms to end, start offset by 0.275 ms.
   ->  Index Scan using idx_tbl3_2 on tbl3  (cost=0.00..200.86 rows=1 width=135)
         Index Cond: c2::text = 'abc'::text
         Rows out:  0 rows (seg0) with 0.031 ms to end, start offset by 2.868 ms.
 Slice statistics:
   (slice0)    Executor memory: 147K bytes.
   (slice1)    Executor memory: 145K bytes avg x 3 workers, 145K bytes max (seg0).
 Statement statistics:
   Memory used: 128000K bytes
 Total runtime: 3.120 ms
(11 rows)

7、不走索引,输入的类型为定长类型,但是字段为变长类型。

postgres=# explain analyze select * from tbl3 where c2='abc'::bpchar;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..17531.66 rows=1 width=135)
   Rows out:  0 rows at destination with 60 ms to end, start offset by 0.276 ms.
   ->  Seq Scan on tbl3  (cost=0.00..17531.66 rows=1 width=135)
         Filter: c2::bpchar = 'abc'::bpchar
         Rows out:  0 rows (seg0) with 57 ms to end, start offset by 2.864 ms.
 Slice statistics:
   (slice0)    Executor memory: 147K bytes.
   (slice1)    Executor memory: 131K bytes avg x 3 workers, 131K bytes max (seg0).
 Statement statistics:
   Memory used: 128000K bytes
 Total runtime: 60.320 ms
(11 rows)

8、内表不走索引,因为JOIN字段类型不匹配。

postgres=# explain select count(*) from tbl3 t1 join tbl3 t2 on (t1.c1=t2.c3) and t1.c1='abc';
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Aggregate  (cost=25235.81..25235.82 rows=1 width=8)
   ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=25235.74..25235.79 rows=1 width=8)
         ->  Aggregate  (cost=25235.74..25235.75 rows=1 width=8)
               ->  Nested Loop  (cost=0.00..25235.66 rows=11 width=0)
                     ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..200.88 rows=1 width=33)
                           ->  Index Scan using idx_tbl3_1 on tbl3 t1  (cost=0.00..200.86 rows=1 width=33)
                                 Index Cond: c1 = 'abc'::text AND 'abc'::text = c1
                     ->  Seq Scan on tbl3 t2  (cost=0.00..25034.46 rows=11 width=65)
                           Filter: 'abc'::text = c3::text AND c3::text = 'abc'::text
(9 rows)

9、内表走索引,因为JOIN字段类型都是变长,匹配。

postgres=# explain select count(*) from tbl3 t1 join tbl3 t2 on (t1.c1=t2.c2) and t1.c1='abc';
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=401.86..401.87 rows=1 width=8)
   ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=401.80..401.85 rows=1 width=8)
         ->  Aggregate  (cost=401.80..401.81 rows=1 width=8)
               ->  Nested Loop  (cost=200.88..401.79 rows=2 width=0)
                     ->  Index Scan using idx_tbl3_1 on tbl3 t1  (cost=0.00..200.86 rows=1 width=33)
                           Index Cond: c1 = 'abc'::text AND 'abc'::text = c1
                     ->  Materialize  (cost=200.88..200.89 rows=1 width=33)
                           ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..200.88 rows=1 width=33)
                                 ->  Index Scan using idx_tbl3_2 on tbl3 t2  (cost=0.00..200.86 rows=1 width=33)
                                       Index Cond: 'abc'::text = c2::text AND c2::text = 'abc'::text
(10 rows)

分析

实际上就是要求索引的表达式与条件的表达式一致。

例子,使用表达式,强制转换为变长类型。

postgres=# create index idx_tbl3_4 on tbl3 ((c3::text));
CREATE INDEX

查询时,只要表达式的类型与条件类型匹配即可,走索引,同时会使用Implicit转换。

postgres=# explain analyze select * from tbl3 where (c3::text)='abc';
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=108.60..52793.25 rows=1001 width=135)
   Rows out:  0 rows at destination with 3.796 ms to end, start offset by 0.309 ms.
   ->  Bitmap Heap Scan on tbl3  (cost=108.60..52793.25 rows=334 width=135)
         Recheck Cond: c3::text = 'abc'::text
         Rows out:  0 rows (seg0) with 0.163 ms to end, start offset by 3.095 ms.
         ->  Bitmap Index Scan on idx_tbl3_4  (cost=0.00..108.35 rows=334 width=0)
               Index Cond: c3::text = 'abc'::text
               Bitmaps out:  Avg 1.0 x 3 workers.  Max 1 (seg0) with 0.135 ms to end, start offset by 3.119 ms.
 Slice statistics:
   (slice0)    Executor memory: 155K bytes.
   (slice1)    Executor memory: 321K bytes avg x 3 workers, 321K bytes max (seg0).
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  enable_seqscan=off
 Total runtime: 4.219 ms
(15 rows)
postgres=# explain analyze select * from tbl3 where (c3::text) = '123'::int;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=108.60..52793.25 rows=1001 width=135)
   Rows out:  0 rows at destination with 2.926 ms to end, start offset by 0.298 ms.
   ->  Bitmap Heap Scan on tbl3  (cost=108.60..52793.25 rows=334 width=135)
         Recheck Cond: c3::text = '123'::text
         Rows out:  0 rows (seg0) with 0.110 ms to end, start offset by 3.057 ms.
         ->  Bitmap Index Scan on idx_tbl3_4  (cost=0.00..108.35 rows=334 width=0)
               Index Cond: c3::text = '123'::text
               Bitmaps out:  Avg 1.0 x 3 workers.  Max 1 (seg0) with 0.064 ms to end, start offset by 3.095 ms.
 Slice statistics:
   (slice0)    Executor memory: 155K bytes.
   (slice1)    Executor memory: 289K bytes avg x 3 workers, 289K bytes max (seg0).
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  enable_seqscan=off
 Total runtime: 3.327 ms
(15 rows)
postgres=# \dC
                                         List of casts
         Source type         |         Target type         |      Function      |   Implicit?
-----------------------------+-----------------------------+--------------------+---------------
 abstime                     | date                        | date               | in assignment
 abstime                     | integer                     | (binary coercible) | no
 abstime                     | timestamp without time zone | timestamp          | yes
 abstime                     | timestamp with time zone    | timestamptz        | yes
 abstime                     | time without time zone      | time               | in assignment
 bigint                      | bit                         | bit                | no
 bigint                      | character                   | text               | in assignment
 bigint                      | character varying           | text               | in assignment
 bigint                      | double precision            | float8             | yes
 bigint                      | integer                     | int4               | in assignment
 bigint                      | numeric                     | numeric            | yes
 bigint                      | oid                         | oid                | yes
 bigint                      | real                        | float4             | yes
 ...............

小结

要让索引起作用,最起码应该关注一下几点。

1、注意变长字符串和定长字符串,属于两种数据类型,不要混淆。

2、JOIN时操作符两边的字段类型一致。

3、查询时,WHERE条件与索引表达式一致。

4、查询时,WHERE条件与索引字段的类型一致。

5、查询时,WHERE条件的collate与索引的collate一致。(本地化相关)

6、操作符本身支持对应的索引访问方法。例如大多数的=操作符,支持b-tree的索引访问方法。

7、PG是CBO优化,因此索引访问的成本更低时,才会选择索引扫描。(设置了特定的优化器开关,或者使用了HINT除外)。

参考

《PostgreSQL 自定义自动类型转换(CAST)》

时间: 2024-10-03 05:17:12

Greenplum 类型一致性使用规范 - 索引条件、JOIN的类型一致性限制的相关文章

浅析MySQL中的Index Condition Pushdown (ICP 索引条件下推)和Multi-Range Read(MRR 索引多范围查找)查询优化

原文:浅析MySQL中的Index Condition Pushdown (ICP 索引条件下推)和Multi-Range Read(MRR 索引多范围查找)查询优化   本文出处:http://www.cnblogs.com/wy123/p/7374078.html(保留出处并非什么原创作品权利,本人拙作还远远达不到,仅仅是为了链接到原文,因为后续对可能存在的一些错误进行修正或补充,无他)     ICP优化原理 Index Condition Pushdown (ICP),也称为索引条件下推

探究MySQL优化器对索引和JOIN顺序的选择_Mysql

本文通过一个案例来看看MySQL优化器如何选择索引和JOIN顺序.表结构和数据准备参考本文最后部分"测试环境".这里主要介绍MySQL优化器的主要执行流程,而不是介绍一个优化器的各个组件(这是另一个话题).    我们知道,MySQL优化器只有两个自由度:顺序选择:单表访问方式:这里将详细剖析下面的SQL,看看MySQL优化器如何做出每一步的选择. explain select * from employee as A,department as B where A.LastName

js中各种类型的变量在if条件中是true还是false_javascript技巧

js的弱类型让人感觉很多东西都比较迷惑,例如变量在if条件中到底是true还是false,如果在强类型语言中将一个非boolean类型的变量放在if条件中时,是需要进行类型转换的,但是js不需要,下面来进行测试,测试常见的变量类型在if条件中的表现 !function test1(){ <span style="color:#ff0000;">var a,b=-1,c= 1,d= 0,e=null,f=undefined,g='',h="";</s

sql server-sqlserver 查询条件为nvarchar类型缺报“转换成数据类型 int 时失败”

问题描述 sqlserver 查询条件为nvarchar类型缺报"转换成数据类型 int 时失败" 解决方案 OrderId这一列类型是不是int类型 解决方案二: 直接在外面select * from 表 where OrderId = ' 11db...'试试看,报错了没 解决方案三: 是不是视图内容定义本身有问题

MySQL索引条件下推的简单测试

自MySQL 5.6开始,在索引方面有了一些改进,比如索引条件下推(Index condition pushdown,ICP),严格来说属于优化器层面的改进. 如果简单来理解,就是优化器会尽可能的把index condition的处理从Server层下推到存储引擎层.举一个例子,有一个表中含有组合索引idx_cols包含(c1,c2,-,cn)n个列,如果在c1上存在范围扫描的where条件,那么剩余的c2,-,cn这n-1个上索引都无法用来提取和过滤数据,而ICP就是把这个事情优化一下. 我们

mysql (ICP) 索引条件下推对比ORACLE进行说明

mysql (ICP) 索引条件下推对比ORACLE进行说明 第一次看到这个名词,与ORACLE FPD - filter push-down想到了一块,但是后来才发现他们根本同一个东西, 简单的收ICP就是当索引包含所有的访问字段的时候,可以在根据前导列过滤掉条件的时候,同时过滤掉另外的 条件,比如说 CREATE TABLE TESTICP(A INT,B INT,C NAME); ALTER TABLE TESTTICP ADD KEY(A,B); SELECT * FROM TESTIC

String类型怎么装换为java.sql.Date类型

问题描述 String类型怎么装换为java.sql.Date类型 在java中如何将String类型转换成java.sql.Date类型 解决方案 问题出在你的格式化字符串上.参考官网文档Class SimpleDateFormat格式化字符串是有保留字符的,每个字符代表不同的意义,区分大小写.大写的M表示的是月份,不是说你用在的时间的位置它就表示时间了.所以把时间格式化字符串改成yyyy-MM-dd HH:mm:ss,就能正确输出了. 解决方案二: 在java中如何将String类型转换成j

从nosql中取出数据为json类型的字符串,是否需要转换下类型 ,然后再前台展示,

问题描述 从nosql中取出数据为json类型的字符串,是否需要转换下类型 ,然后再前台展示, {"22224":{"201511046623792932":"{phone_no=15928785426, event_id=151030112638529659, voice_id=2, id_no=22224}","201511046623793251":"{phone_no=15928785426, event_i

visual studio-IIS7上出现错误无法将类型为 的 COM 对象强制转换为接口类型 。

问题描述 IIS7上出现错误无法将类型为 的 COM 对象强制转换为接口类型 . 求助:使用visual studio 2015编写设计网站,使用VS的调试工具进行调试可以正常运行.但是部署到IIS7上出现错误无法将类型为"System.__ComObject"的 COM 对象强制转换为接口类型"OpcRcw.Da.IOPCServer".此操作失败的原因是对 IID 为"{39C13A4D-011E-11D0-9675-0020AFD8ADB3}&quo