PostgreSQL开发小讲堂 - 不等于、数组不包含 要不要用索引?

标签

PostgreSQL , 不等于 , 索引 , 外连接


背景

在数据库中不等于能不能走索引呢?理论上是不行的,但是有方法可以让不等于也走索引(虽然走索引也不一定好)。

比如使用外连接实现(需要表有PK才行,没有PK可以使用行号代替),或者使用not exists,或者使用partial index(不支持变量)。

对于返回结果集很大的场景,建议使用游标分页返回,此时可能用全表扫描更适合。

例子1, 有PK

create table tbl_pk(id int primary key, c1 int);
create index idx_tbl_pk on tbl_pk(c1);
insert into tbl_pk select generate_series(1,1000000), random()*10000;

原始方法,不能走索引
select * from tbl_pk where c1 <> 1;

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl_pk where c1 <> 1;

QUERY PLAN

Seq Scan on public.tbl_pk (cost=0.00..16925.00 rows=999902 width=8) (actual time=0.020..182.603 rows=999919 loops=1)
Output: id, c1
Filter: (tbl_pk.c1 <> 1)
Rows Removed by Filter: 81
Buffers: shared hit=4425
Planning time: 0.486 ms
Execution time: 249.335 ms
(7 rows)

让他走索引的写法

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.id=t2.id and t2.c1=1) where t2.* is null;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=352.55..18528.53 rows=5000 width=8) (actual time=0.229..395.158 rows=999919 loops=1)
   Output: t1.id, t1.c1
   Hash Cond: (t1.id = t2.id)
   Filter: (t2.* IS NULL)
   Rows Removed by Filter: 81
   Buffers: shared hit=4509
   ->  Seq Scan on public.tbl_pk t1  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.009..128.644 rows=1000000 loops=1)
         Output: t1.id, t1.c1
         Buffers: shared hit=4425
   ->  Hash  (cost=351.32..351.32 rows=98 width=36) (actual time=0.211..0.211 rows=81 loops=1)
         Output: t2.id, t2.*
         Buckets: 1024  Batches: 1  Memory Usage: 6kB
         Buffers: shared hit=84
         ->  Bitmap Heap Scan on public.tbl_pk t2  (cost=5.18..351.32 rows=98 width=36) (actual time=0.044..0.185 rows=81 loops=1)
               Output: t2.id, t2.*
               Recheck Cond: (t2.c1 = 1)
               Heap Blocks: exact=81
               Buffers: shared hit=84
               ->  Bitmap Index Scan on idx_tbl_pk  (cost=0.00..5.16 rows=98 width=0) (actual time=0.025..0.025 rows=81 loops=1)
                     Index Cond: (t2.c1 = 1)
                     Buffers: shared hit=3
 Planning time: 0.289 ms
 Execution time: 461.386 ms
(23 rows)  

postgres=# set enable_seqscan=off;
SET  

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.id=t2.id and t2.c1=1) where t2.* is null;
                                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=354.99..33264.46 rows=5000 width=8) (actual time=0.224..482.241 rows=999919 loops=1)
   Output: t1.id, t1.c1
   Merge Cond: (t1.id = t2.id)
   Filter: (t2.* IS NULL)
   Rows Removed by Filter: 81
   Buffers: shared hit=7244
   ->  Index Scan using tbl_pk_pkey on public.tbl_pk t1  (cost=0.42..30408.42 rows=1000000 width=8) (actual time=0.008..215.816 rows=1000000 loops=1)
         Output: t1.id, t1.c1
         Buffers: shared hit=7160
   ->  Sort  (cost=354.57..354.81 rows=98 width=36) (actual time=0.211..0.233 rows=81 loops=1)
         Output: t2.id, t2.*
         Sort Key: t2.id
         Sort Method: quicksort  Memory: 31kB
         Buffers: shared hit=84
         ->  Bitmap Heap Scan on public.tbl_pk t2  (cost=5.18..351.32 rows=98 width=36) (actual time=0.046..0.183 rows=81 loops=1)
               Output: t2.id, t2.*
               Recheck Cond: (t2.c1 = 1)
               Heap Blocks: exact=81
               Buffers: shared hit=84
               ->  Bitmap Index Scan on idx_tbl_pk  (cost=0.00..5.16 rows=98 width=0) (actual time=0.026..0.026 rows=81 loops=1)
                     Index Cond: (t2.c1 = 1)
                     Buffers: shared hit=3
 Planning time: 0.275 ms
 Execution time: 548.991 ms
(24 rows)  

postgres=# set enable_sort=off;
SET
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.id=t2.id and t2.c1=1) where t2.* is null;
                                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=352.97..34511.95 rows=5000 width=8) (actual time=0.235..494.068 rows=999919 loops=1)
   Output: t1.id, t1.c1
   Hash Cond: (t1.id = t2.id)
   Filter: (t2.* IS NULL)
   Rows Removed by Filter: 81
   Buffers: shared hit=7244
   ->  Index Scan using tbl_pk_pkey on public.tbl_pk t1  (cost=0.42..30408.42 rows=1000000 width=8) (actual time=0.008..223.681 rows=1000000 loops=1)
         Output: t1.id, t1.c1
         Buffers: shared hit=7160
   ->  Hash  (cost=351.32..351.32 rows=98 width=36) (actual time=0.218..0.218 rows=81 loops=1)
         Output: t2.id, t2.*
         Buckets: 1024  Batches: 1  Memory Usage: 6kB
         Buffers: shared hit=84
         ->  Bitmap Heap Scan on public.tbl_pk t2  (cost=5.18..351.32 rows=98 width=36) (actual time=0.051..0.201 rows=81 loops=1)
               Output: t2.id, t2.*
               Recheck Cond: (t2.c1 = 1)
               Heap Blocks: exact=81
               Buffers: shared hit=84
               ->  Bitmap Index Scan on idx_tbl_pk  (cost=0.00..5.16 rows=98 width=0) (actual time=0.031..0.031 rows=81 loops=1)
                     Index Cond: (t2.c1 = 1)
                     Buffers: shared hit=3
 Planning time: 0.274 ms
 Execution time: 560.676 ms
(23 rows)  

postgres=# set enable_hashjoin=off;
SET
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.id=t2.id and t2.c1=1) where t2.* is null;
                                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=0.85..65818.07 rows=5000 width=8) (actual time=1.080..664.991 rows=999919 loops=1)
   Output: t1.id, t1.c1
   Merge Cond: (t1.id = t2.id)
   Filter: (t2.* IS NULL)
   Rows Removed by Filter: 81
   Buffers: shared hit=14320
   ->  Index Scan using tbl_pk_pkey on public.tbl_pk t1  (cost=0.42..30408.42 rows=1000000 width=8) (actual time=0.007..208.319 rows=1000000 loops=1)
         Output: t1.id, t1.c1
         Buffers: shared hit=7160
   ->  Index Scan using tbl_pk_pkey on public.tbl_pk t2  (cost=0.42..32908.43 rows=98 width=36) (actual time=1.067..193.630 rows=81 loops=1)
         Output: t2.id, t2.*
         Filter: (t2.c1 = 1)
         Rows Removed by Filter: 999919
         Buffers: shared hit=7160
 Planning time: 0.261 ms
 Execution time: 732.070 ms
(16 rows)  

例子2, 没有PK

没有PK时,可以使用行号进行关联。

postgres=# create index tbl_pk_idx2 on tbl_pk (ctid);
CREATE INDEX  

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.ctid=t2.ctid and t2.c1=1) where t2.* is null;
                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=0.85..92362.07 rows=5000 width=8) (actual time=1.232..720.467 rows=999919 loops=1)
   Output: t1.id, t1.c1
   Merge Cond: (t1.ctid = t2.ctid)
   Filter: (t2.* IS NULL)
   Rows Removed by Filter: 81
   Buffers: shared hit=11585 read=2735
   ->  Index Scan using tbl_pk_idx2 on public.tbl_pk t1  (cost=0.42..43680.43 rows=1000000 width=14) (actual time=0.041..268.576 rows=1000000 loops=1)
         Output: t1.id, t1.c1, t1.ctid
         Buffers: shared hit=7157 read=3
   ->  Index Scan using tbl_pk_idx2 on public.tbl_pk t2  (cost=0.42..46180.43 rows=98 width=38) (actual time=1.186..213.849 rows=81 loops=1)
         Output: t2.ctid, t2.*
         Filter: (t2.c1 = 1)
         Rows Removed by Filter: 999919
         Buffers: shared hit=4428 read=2732
 Planning time: 0.336 ms
 Execution time: 787.713 ms
(16 rows)

例子3, 数组查询中的不包含

postgres=# create table tbl_arr(id int[], c1 text);
CREATE TABLE
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# create index idx_tbl_arr_1 on tbl_arr using gin (id);
CREATE INDEX
postgres=# create index idx_tbl_arr_2 on tbl_arr (ctid);
CREATE INDEX  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl_arr where not (id @> array[1,2]);
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Seq Scan on public.tbl_arr  (cost=0.00..1.12 rows=10 width=64) (actual time=0.017..0.029 rows=10 loops=1)
   Output: id, c1
   Filter: (NOT (tbl_arr.id @> '{1,2}'::integer[]))
   Buffers: shared hit=1
 Planning time: 0.086 ms
 Execution time: 0.051 ms
(6 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_arr t1 left join tbl_arr t2 on (t1.ctid=t2.ctid and t2.id @> array[1,2]) where t2.* is null;
                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=0.27..24.63 rows=1 width=64) (actual time=0.133..0.137 rows=10 loops=1)
   Output: t1.id, t1.c1
   Merge Cond: (t1.ctid = t2.ctid)
   Filter: (t2.* IS NULL)
   Buffers: shared hit=3 read=1
   ->  Index Scan using idx_tbl_arr_2 on public.tbl_arr t1  (cost=0.14..12.29 rows=10 width=70) (actual time=0.108..0.110 rows=10 loops=1)
         Output: t1.id, t1.c1, t1.ctid
         Buffers: shared hit=1 read=1
   ->  Index Scan using idx_tbl_arr_2 on public.tbl_arr t2  (cost=0.14..12.31 rows=1 width=94) (actual time=0.022..0.022 rows=0 loops=1)
         Output: t2.ctid, t2.*
         Filter: (t2.id @> '{1,2}'::integer[])
         Rows Removed by Filter: 10
         Buffers: shared hit=2
 Planning time: 0.193 ms
 Execution time: 0.173 ms
(15 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_arr t1 where not exists (select 1 from tbl_arr t2 where t1.ctid=t2.ctid and t2.id @> array[1,2]);
                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Merge Anti Join  (cost=0.27..24.63 rows=9 width=64) (actual time=0.023..0.026 rows=10 loops=1)
   Output: t1.id, t1.c1
   Merge Cond: (t1.ctid = t2.ctid)
   Buffers: shared hit=4
   ->  Index Scan using idx_tbl_arr_2 on public.tbl_arr t1  (cost=0.14..12.29 rows=10 width=70) (actual time=0.004..0.005 rows=10 loops=1)
         Output: t1.id, t1.c1, t1.ctid
         Buffers: shared hit=2
   ->  Index Scan using idx_tbl_arr_2 on public.tbl_arr t2  (cost=0.14..12.31 rows=1 width=6) (actual time=0.017..0.017 rows=0 loops=1)
         Output: t2.ctid
         Filter: (t2.id @> '{1,2}'::integer[])
         Rows Removed by Filter: 10
         Buffers: shared hit=2
 Planning time: 0.103 ms
 Execution time: 0.047 ms
(14 rows)
时间: 2024-09-23 16:02:56

PostgreSQL开发小讲堂 - 不等于、数组不包含 要不要用索引?的相关文章

IOS开发之路--C语言数组和字符串_IOS

概览 数组在C语言中有着特殊的地位,它有很多特性,例如它的存储是连续的,数组的名称就是数组的地址等.而在C语言中是没有String类型的,那么如果要表示一个字符串,就必须使用字符串数组.今天主要就介绍如下三个方面: 一维数组 多维数组 字符串 一维数组 一维数组操作比较简单,但是需要注意,数组长度必须是固定的,长度不能使用变量进行初始化:如果声明的同时进行赋值则数组长度可以省略,编译器会自动计算数组长度:同时数组不能先声明再一次性赋值(当然可以对每个元素一一赋值). #include <stdi

asp.net-c#网站开发小问题______

问题描述 c#网站开发小问题______ asp.net网站开发,后台用什么方法用数组或集合接收同名参数?get和post不同或混合有影响吗? 解决方案 C#开发小问题 解决方案二: 没看懂你说的问题是什么 解决方案三: 职称文章发表,联系 QQ: 316573695 正规期刊,符合毕业·评定中高级职称的要求 内设 科技论坛.信息技术.工程科技.农林研究.管理科学.经济科苑 栏目 来稿要求: 作者必须提供论文的题目.作者.单位.单位邮编.摘要.关键词.作者简介.参考文献 省(部)级以上立项的课题

给指针malloc分配空间后就等于数组吗?

首先回答这个的问题:严格的说不等于数组,但是可以认为它是个数组一样的使用而不产生任何问题.不过既然这样,那它应该算是个数组吧.所以,一般我们都用"动态数组"这种名字来称呼这种东西. 要讲清楚这个东西,涉及到malloc函数,指针类型和"[ ]"下标运算. ======分割线[0]======malloc是C的标准库函数之一,用来分配动态内存. 一般来说,由C/C++编译的程序会在运行的时候在内存中占用一些空间,它们分为以下几个部分:1.二进制代码区 不必过多解释了,

android开发如何声明动态一维数组

问题描述 android开发如何声明动态一维数组 我是一个Android新手,我现在想读取一个文件,文件大小未知,我想用 byte[] buffer = new byte[3072]; temp_stream = new FileInputStream(img_name); temp_stream.read(buffer); buffer接收,如何声明一个动态byte[] buffer. 解决方案 Java获得文件大小的方法(通过FileInputStream) 根据指定文件创建FileInpu

DB2与PostgreSQL开发的相同及不同之处

本文主要向大家介绍的是了DB2数据库开发与PostgreSQL开发在实际操作中的一些相同之处,与不同之 处,最近用了用DB2,写了点区别.不一定完全正确.不过可以看看两者区别. 1,数据定义语言(DDL)以及外围工具:db2与postgresql的区别几乎为0,具体区别是: a.db2 varchar 在没有设置特殊表空间的话,上限为32K,postgresql的varchar没有这个限制; b.db2的二进制大对象类型blob,在postgresql里用bytea类型代替; c.db2的文本大

5人iOS开发小团队的生存发展之道

近年来,iOS用户日益增多,许多中小企业力图从App Store中谋求发展之道.在市场竞争日趋激烈的今天,如何能令一个5人开发小团队站稳脚跟,在保证企业生存的前提下,发展成为一支独具特色的创新队伍呢?近日,在"知乎"网上,专业人士们对此展开了讨论. 第一,尽可能找大型机构合作.有过创业经历的人都深知,小型企业在起步阶段往往面临融资难,技术不成熟,缺乏运作经验等问题.如果与资金实力雄厚的大型机构合作完成第一个项目,或许能够帮助团队掘到第一桶金.同时,也有利于建立与大型机构的长期合作关系.

iOS开发系列--C语言之数组和字符串

概览 数组在C语言中有着特殊的地位,它有很多特性,例如它的存储是连续的,数组的名称就是数组的地址等.而在C语言中是没有String类型的,那么如果要表示一个字符串,就必须使用字符数组.今天主要就介绍如下三个方面: 一维数组 多维数组 字符串 一维数组 一维数组操作比较简单,但是需要注意,数组长度必须是固定的,长度不能使用变量进行初始化:如果声明的同时进行赋值则数组长度可以省略,编译器会自动计算数组长度:同时数组不能先声明再一次性赋值(当然可以对每个元素一一赋值). #include <stdio

代码-自学PHP mysql web开发小次鸟,寻求大神帮助 第23章会话控制

问题描述 自学PHP mysql web开发小次鸟,寻求大神帮助 第23章会话控制 代码运行一直提示我第23行有错误<?phpsession_start();//开始会话 if (isset($_POST['userid'])&&isset($_POST['password'])) {//使用者尝试登录$userid=$_POST['userid'];$password=$_POST['password']; $db_conn=new mysqli('localhost''root'

xml-android开发小程序,入门知识

问题描述 android开发小程序,入门知识 这是一个简单的安卓浏览器,但是安装到手机后无法搜索.代码如下: <?xml version="1.0" encoding="utf-8"?> android:orientation="vertical" android:layout_width="fill_parent" android:layout_height="fill_parent" >