PostgreSQL 为什么不要滥用unlogged table & hash index

unlogged table和hash index同样都不会写XLOG，所以如果你用流复制来搞HA，一定概要搞清楚一个问题，切换到备库的话unlogged table数据会被清掉，而hash index也没有，走hash index会失败。

unlogged table 的风险以及修复手段可以见 ：
http://blog.163.com/digoal@126/blog/static/163877040201582621345351/

hash index则风险略小，但是也必须重建，但是个人还是建议大家不要使用hash index，改用btree，因为性能确实相差无几。

批量将数据库集群的hash index修改为btree index的方法：
例子：
先并行创建一个btree索引，然后并行删除对应的hash 索引。
$ vi test.sh
#!/bin/bash

for db in `psql -n -q -t -h 127.0.0.1 -p 1921 -U postgres postgres -c "copy (select datname from pg_database where datname <>'template0') to stdout;"`
do 

psql -n -q -t -h 127.0.0.1 -p 1921 -U postgres $db -c "with t1(sql,nsp,idx) as (select regexp_replace(indexdef,'USING hash','USING btree'),schemaname,indexname from pg_indexes where indexdef ~ 'USING hash'), t2(sql_create,sql_drop) as (select regexp_replace(sql,'CREATE INDEX','CREATE INDEX CONCURRENTLY'), 'DROP INDEX CONCURRENTLY '||quote_ident(nsp)||'.'||quote_ident(idx) from t1) select regexp_replace(sql_create,'CONCURRENTLY (.*) ON','CONCURRENTLY \1_0926 ON')  ||'; '|| sql_drop ||'; ' from t2;"|psql -a -e -h 127.0.0.1 -p 1921 -U postgres $db -f -

done

$ . ./test.sh

这个with查询的结果如下举例：
postgres=# with t1(sql,nsp,idx) as (select regexp_replace(indexdef,'USING hash','USING btree'),schemaname,indexname from pg_indexes where indexdef ~ 'USING hash'),
t2(sql_create,sql_drop) as (select regexp_replace(sql,'CREATE INDEX','CREATE INDEX CONCURRENTLY'), 'DROP INDEX CONCURRENTLY '||quote_ident(nsp)||'.'||quote_ident(idx) from t1)
select regexp_replace(sql_create,'CONCURRENTLY (.*) ON','CONCURRENTLY \1_0926 ON')  ||'; '|| sql_drop ||'; ' from t2;
                                            ?column?
-------------------------------------------------------------------------------------------------
 CREATE INDEX CONCURRENTLY hi1_0926 ON t USING btree (id); DROP INDEX CONCURRENTLY public.hi1;
 CREATE INDEX CONCURRENTLY hi2_0926 ON s1.tbl USING btree (id); DROP INDEX CONCURRENTLY s1.hi2;
(2 rows)

或者在每个数据库调用这个inline code：
do language plpgsql $$
declare
  v_sql text;
  v_schema name;
  v_idx name;
  sql1 text;
begin
for v_sql,v_schema,v_idx in select regexp_replace(indexdef,'USING hash','USING btree'),schemaname,indexname from pg_indexes where indexdef ~ 'USING hash'
loop
  sql1='DROP INDEX '||quote_ident(v_schema)||'.'||quote_ident(v_idx);
  execute sql1;
  execute v_sql;
end loop;
end;
$$; 

postgres=# \d t
       Table "public.t"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer | not null
Indexes:
    "t_pkey" PRIMARY KEY, btree (id)
    "i1" btree (id)
    "i2" btree (id)

postgres=# \d s1.tbl
        Table "s1.tbl"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer |
Indexes:
    "i1" btree (id)

关于hash和btree性能：
查询性能：
postgres=# create table tbl(id int, info text);
CREATE TABLE
postgres=# insert into tbl select generate_series(1,1000000);
INSERT 0 1000000
postgres=# create index idx_tbl1 on tbl using hash (id);

$ vi test.sql
\setrandom id 1 1000000
select * from tbl where id=:id;

postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test.sql -c 8 -j 8 -T 30
progress: 1.0 s, 24219.3 tps, lat 0.258 ms stddev 0.436
progress: 2.0 s, 29387.1 tps, lat 0.270 ms stddev 0.401
progress: 3.0 s, 29281.0 tps, lat 0.271 ms stddev 0.442
progress: 4.0 s, 29231.7 tps, lat 0.272 ms stddev 0.844
......
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 8
duration: 30 s
number of transactions actually processed: 876806
latency average: 0.270 ms
latency stddev: 0.592 ms
tps = 29202.503991 (including connections establishing)
tps = 29379.956438 (excluding connections establishing)
statement latencies in milliseconds:
        0.003062        \setrandom id 1 1000000
        0.266481        select * from tbl where id=:id;

postgres=# drop index idx_tbl1 ;
DROP INDEX
postgres=# create index idx_tbl1 on tbl using btree (id);
CREATE INDEX

postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test.sql -c 8 -j 8 -T 30
progress: 1.0 s, 28414.2 tps, lat 0.240 ms stddev 0.306
progress: 2.0 s, 31192.2 tps, lat 0.255 ms stddev 0.605
progress: 3.0 s, 31022.8 tps, lat 0.256 ms stddev 0.451
progress: 4.0 s, 29587.1 tps, lat 0.268 ms stddev 0.671
......
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 8
duration: 30 s
number of transactions actually processed: 903467
latency average: 0.263 ms
latency stddev: 0.678 ms
tps = 30088.054150 (including connections establishing)
tps = 30229.295069 (excluding connections establishing)
statement latencies in milliseconds:
        0.002900        \setrandom id 1 1000000
        0.259402        select * from tbl where id=:id;

更新性能
$ vi test.sql
\setrandom id 1 1000000
update tbl set id=1+:id where id=:id;

postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test.sql -c 8 -j 8 -T 30
progress: 1.0 s, 12500.2 tps, lat 0.570 ms stddev 0.864
progress: 2.0 s, 17456.9 tps, lat 0.456 ms stddev 0.641
progress: 3.0 s, 18242.3 tps, lat 0.435 ms stddev 0.234
progress: 4.0 s, 17693.0 tps, lat 0.450 ms stddev 0.909
progress: 5.0 s, 17753.3 tps, lat 0.448 ms stddev 0.758
......
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 8
duration: 30 s
number of transactions actually processed: 521331
latency average: 0.456 ms
latency stddev: 0.702 ms
tps = 17372.386945 (including connections establishing)
tps = 17430.542432 (excluding connections establishing)
statement latencies in milliseconds:
        0.003072        \setrandom id 1 1000000
        0.452489        update tbl set id=1+:id where id=:id;

postgres=# drop index idx_tbl1 ;
DROP INDEX
postgres=# create index idx_tbl1 on tbl using hash (id);
CREATE INDEX

postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test.sql -c 8 -j 8 -T 30
progress: 1.0 s, 16321.8 tps, lat 0.411 ms stddev 0.521
progress: 2.0 s, 17372.0 tps, lat 0.458 ms stddev 0.409
progress: 3.0 s, 16731.5 tps, lat 0.475 ms stddev 1.094
progress: 4.0 s, 16972.9 tps, lat 0.469 ms stddev 0.880
progress: 5.0 s, 17392.7 tps, lat 0.457 ms stddev 0.607
......
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 8
duration: 30 s
number of transactions actually processed: 527778
latency average: 0.450 ms
latency stddev: 0.678 ms
tps = 17587.300360 (including connections establishing)
tps = 17671.181609 (excluding connections establishing)
statement latencies in milliseconds:
        0.002955        \setrandom id 1 1000000
        0.446435        update tbl set id=1+:id where id=:id;

SIZE
postgres=# create index idx_tbl1 on tbl using hash (id);
CREATE INDEX
postgres=# create index idx_tbl2 on tbl using btree (id);
CREATE INDEX
postgres=# \di+ idx_tbl*
                         List of relations
 Schema |   Name   | Type  |  Owner   | Table | Size  | Description
--------+----------+-------+----------+-------+-------+-------------
 public | idx_tbl1 | index | postgres | tbl   | 32 MB |
 public | idx_tbl2 | index | postgres | tbl   | 21 MB |
(2 rows)

小结:
查询和更新性能相差无几。
大小，btree是hash的2/3。

时间： 2024-10-23 11:01:41

PostgreSQL 为什么不要滥用unlogged table & hash index的相关文章

PostgreSQL 10.0 preview 性能增强 - hash index metapage cache、高并发增强

标签 PostgreSQL , 10.0 , hash index 背景 hash index是PostgreSQL中一个非常老的索引访问方法,也是非常经典的索引. hash index中存储的是索引字段的hash value,而不是原始值,btree索引中存储的是原始值. 因此,当字段非常大时,btree索引可能无法使用. 例如 postgres=# create table test_hash_btree(c1 text); CREATE TABLE postgres=# insert in

PostgreSQL 10.0 preview 功能增强 - hash index 支持wal(灾难恢复)

标签 PostgreSQL , 10.0 , hash index , wal , 灾难恢复背景 PostgreSQL 10.0 将支持hash index WAL. 因此建hash index再也不怕数据库crash或者备库hash index不可用了. $SUBJECT will make hash indexes reliable and usable on standby. AFAIU, currently hash indexes are not recommended to be

PostgreSQL 从文件系统恢复遗失的UNLOGGED table's datafile.

PostgreSQL的unlogged table是不记录xlog的,所以在备库上没有unlogged table的数据记录. 另一方面,当数据库异常crash时,数据库重启后需要从WAL恢复,由于unlogged table不记录WAL,为了保证数据的一致性,postgresql会自动清除unlogged table的数据文件. 那么问题来了,万一你不小心误创建了unlogged table,你可能一开始会没有感知,但是一旦发生以下情况,你会发现数据不见了. 1. 数据库crash掉之后,重启

HTAP数据库 PostgreSQL 场景与性能测试之 42 - (OLTP+OLAP) unlogged table 不含索引多表批量写入

标签 PostgreSQL , HTAP , OLTP , OLAP , 场景与性能测试背景 PostgreSQL是一个历史悠久的数据库,历史可以追溯到1973年,最早由2014计算机图灵奖得主,关系数据库的鼻祖Michael_Stonebraker 操刀设计,PostgreSQL具备与Oracle类似的功能.性能.架构以及稳定性. PostgreSQL社区的贡献者众多,来自全球各个行业,历经数年,PostgreSQL 每年发布一个大版本,以持久的生命力和稳定性著称. 2017年10月,Pos

MySQL hash index VS Btree index

MySQL AHI(adaptive hash index):没有牺牲任何的事物特点和可靠性; 根据搜索的匹配模式,MySQL会利用 B-Tree index key 前半部分(利用btree index 所能找到的部分)长度任意建立hash index.hash index根据需求只对访问频率较高的page中的index建立hashindex 如果一个表的数据全部在内存里面,hash index可以加快查询速度:innodb本身有监控index 查询频率的机制:通过hash index提高查询

Sql Server中的表访问方式Table Scan, Index Scan, Index Seek

Sql Server中的表访问方式Table Scan, Index Scan, Index Seek 0.参考文献 oracle表访问方式 Index Seek和Index Scan的区别以及适用情况 1.oracle中的表访问方式在oracle中有表访问方式的说法,访问表中的数据主要通过三种方式进行访问: 全表扫描(full table scan),直接访问数据页,查找满足条件的数据通过rowid扫描(table access by rowid),如果知道数据的rowid,那么直接通

数据库内核月报－ 2015 / 09-MySQL · 引擎特性 · InnoDB Adaptive hash index介绍

前言我们知道InnoDB的索引组织结构为Btree.通常情况下,我们需要根据查询条件,从根节点开始寻路到叶子节点,找到满足条件的记录.为了减少寻路开销,InnoDB本身做了几点优化. 首先,对于连续记录扫描,InnoDB在满足比较严格的条件时采用row cache的方式连续读取8条记录(并将记录格式转换成MySQL Format),存储在线程私有的row_prebuilt_t::fetch_cache中:这样一次寻路就可以获取多条记录,在server层处理完一条记录后,可以直接从cache中取

MySQL · 引擎特性 · InnoDB Adaptive hash index介绍

MySQL索引之哈希索引和自适应哈希索引（Adaptive Hash Index）

MySQL索引之哈希索引和自适应哈希索引(Adaptive Hash Index) 官网:https://dev.mysql.com/doc/refman/5.6/en/innodb-adaptive-hash.html 索引(Index)是帮助MySQL高效获取数据的数据结构.提取句子主干,就可以得到索引的本质:索引是数据结构. 从MySQL逻辑架构来看,MySQL有三层架构,第一层连接,第二层查询解析.分析.优化.视图.缓存,第三层,存储引擎. 索引通过分开查询片,节省了扫描查找时间,大大