震精 - PostgreSQL decimal64 decimal128 高效率数值类型扩展

背景

PostgreSQL内置的数值类型包括

整型、浮点、整型序列、"无限"精度数值

Name	Storage Size	Description	Range
smallint	2 bytes	small-range integer	-32768 to +32767
integer	4 bytes	typical choice for integer	-2147483648 to +2147483647
bigint	8 bytes	large-range integer	-9223372036854775808 to +9223372036854775807
decimal	variable	user-specified precision, exact	up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
numeric	variable	user-specified precision, exact	up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
real	4 bytes	variable-precision, inexact	6 decimal digits precision
double precision	8 bytes	variable-precision, inexact	15 decimal digits precision
smallserial	2 bytes	small autoincrementing integer	1 to 32767
serial	4 bytes	autoincrementing integer	1 to 2147483647
bigserial	8 bytes	large autoincrementing integer	1 to 9223372036854775807

其中除了 "无限"精度数值类型。他类型都是定长存储，使用时不需要调用palloc，效率较高。

如果你要使用超过双精能表示的有效范围的数值，目前只能选择decimal\numeric类型，而这个类型前面说了，由于是变长设计，需要调用palloc，效率一般。

那么在数据分析领域，或者需要处理非常多的数据记录时，numeric类型的开销是较大的。

PostgreSQL社区有一些扩展，可以解决这个问题，

1. 比如2nd的fixeddecimal插件，使用INT8来表示NUMERIC，精度可调。

《PostgreSQL fixeddecimal - 用CPU "硬解码" 提升1倍数值运算能力助力金融大数据量计算》

2. 比如社区的pgdecimal插件，支持decimal32和decimal64两种类型。

https://pgxn.org/dist/pgdecimal/1.0.0/

3. 比如vitesse的pgdecimal插件，也就是本文将提到的插件，支持decimal64与decimal128类型，精度基本上足够使用。

推荐使用vitesse提供的pgdecimal插件，因为它效率够高，精度够大。

pgdecimal插件介绍

有两个常见的decimal库，decNumber以及Intel提供的Intel ADX库。

pgdecimal插件选择了decNumber库，因为GCC也在用它（法律风险更小？）

https://github.com/gcc-mirror/gcc/tree/master/libdecnumber

decimal库的性能对比

http://speleotrove.com/decimal/dpintro.html

decNumber与Inter ADX性能接近，但是Inter ADX提供了decimal64/128, int32/64, float/double类型的相互转换，这个很给力。（也许将来vitesse会支持intel adx库吧）

pgdecimal 依赖的decNumber，因此我们必须先安装decNumber

decNumber安装

1. 下载 decNumber package

http://speleotrove.com/decimal/

wget http://speleotrove.com/decimal/decNumber-icu-368.zip
unzip decNumber-icu-368.zip

或者从本站链接下载

2. 安装decNumber到postgresql软件目录中(假设postgresql安装在/home/digoal/pgsql9.6)

首先要在postgresql软件的include目录中，创建一个空目录，

mkdir -p /home/digoal/pgsql9.6/include/decnumber

在decNumber src目录中创建Makefile，install -D 修改为对应要安装的目录。

cd decNumber  

vi Makefile  

OBJS = decSingle.o decDouble.o decQuad.o decNumber.o decContext.o  

CFLAGS = -Wall -g -O2 -fPIC  

libdecnumber.a: $(OBJS)
        ar -rcs libdecnumber.a $(OBJS)  

clean:
        rm -f libdecnumber.a $(OBJS)  

install:
        install -D *.h /home/digoal/pgsql9.6/include/decnumber
        install -D libdecnumber.a /home/digoal/pgsql9.6/lib

3. 编译安装decNumber

cd decNumber  

make
make install

4. decNumber的C库reference如下, pgdecimal插件中用到的decnumber库，需要了解细节的话请参考:

The decNumber C library

pgdecimal安装

git clone https://github.com/vitesse-ftian/pgdecimal

或者从本站链接下载

cd pgdecimal

有一个小BUG，.control的版本号没有与sql文件的版本号对齐

mv decimal--2.0.sql decimal--1.0.sql

另外，需要修改一下Makefile，指定版本，以及decnumber的include和lib目录

vi Makefile  

PG_CPPFLAGS = -I/home/digoal/pgsql9.6/include/decnumber
SHLIB_LINK = -L/home/digoal/pgsql9.6/lib -ldecnumber  

DATA = decimal--1.0.sql

安装

export PATH=/home/digoal/pgsql9.6/bin:$PATH
USE_PGXS=1 make clean
USE_PGXS=1 make
USE_PGXS=1 make install  

/bin/mkdir -p '/home/digoal/pgsql9.6/lib'
/bin/mkdir -p '/home/digoal/pgsql9.6/share/extension'
/bin/mkdir -p '/home/digoal/pgsql9.6/share/extension'
/usr/bin/install -c -m 755  decimal.so '/home/digoal/pgsql9.6/lib/decimal.so'
/usr/bin/install -c -m 644 .//decimal.control '/home/digoal/pgsql9.6/share/extension/'
/usr/bin/install -c -m 644 .//decimal--1.0.sql  '/home/digoal/pgsql9.6/share/extension/'

使用

psql  

postgres=# create extension decimal;
CREATE EXTENSION

pgdecimal性能对比

使用int8, float8, decimal64, decimal128, numeric(15,3) 几种类型，分别比较这几种类型的性能。

create table tt(ii bigint, d double precision, d64 decimal64, d128 decimal128, n numeric(15, 3));  

postgres=# \timing
Timing is on.  

生成测试数据  

postgres=# insert into tt select i, i + 0.123, i + 0.123::decimal64, i + 0.123::decimal128, i + 0.123 from generate_series(1, 1000000) i;
INSERT 0 1000000
Time: 2125.723 ms  

postgres=# select * from tt limit 2;
 ii |   d   |  d64  | d128  |   n
----+-------+-------+-------+-------
  1 | 1.123 | 1.123 | 1.123 | 1.123
  2 | 2.123 | 2.123 | 2.123 | 2.123
(2 rows)

普通查询性能对比

postgres=# select count(*) from tt where (d + d*d + d*d*d + d*d*d*d) > 10000000;
 count
--------
 999945
(1 row)  

Time: 411.418 ms
postgres=# select count(*) from tt where (n + n*n + n*n*n + n*n*n*n) > 10000000;
 count
--------
 999945
(1 row)  

Time: 1949.367 ms
postgres=# select count(*) from tt where (d64 + d64*d64 + d64*d64*d64 + d64*d64*d64*d64) > 10000000;
 count
--------
 999945
(1 row)  

Time: 1165.304 ms
postgres=# select count(*) from tt where (d128 + d128*d128 + d128*d128*d128 + d128*d128*d128*d128) > 10000000;
 count
--------
 999945
(1 row)  

Time: 1517.179 ms

排序性能对比

postgres=# select * from tt order by d limit 2 offset 999000;
   ii   |     d      |    d64     |    d128    |     n
--------+------------+------------+------------+------------
 999001 | 999001.123 | 999001.123 | 999001.123 | 999001.123
 999002 | 999002.123 | 999002.123 | 999002.123 | 999002.123
(2 rows)  

Time: 804.645 ms
postgres=# select * from tt order by n limit 2 offset 999000;
   ii   |     d      |    d64     |    d128    |     n
--------+------------+------------+------------+------------
 999001 | 999001.123 | 999001.123 | 999001.123 | 999001.123
 999002 | 999002.123 | 999002.123 | 999002.123 | 999002.123
(2 rows)  

Time: 2828.066 ms
postgres=# select * from tt order by d64 limit 2 offset 999000;
   ii   |     d      |    d64     |    d128    |     n
--------+------------+------------+------------+------------
 999001 | 999001.123 | 999001.123 | 999001.123 | 999001.123
 999002 | 999002.123 | 999002.123 | 999002.123 | 999002.123
(2 rows)  

Time: 1826.044 ms
postgres=# select * from tt order by d128 limit 2 offset 999000;
   ii   |     d      |    d64     |    d128    |     n
--------+------------+------------+------------+------------
 999001 | 999001.123 | 999001.123 | 999001.123 | 999001.123
 999002 | 999002.123 | 999002.123 | 999002.123 | 999002.123
(2 rows)  

Time: 2118.647 ms

哈希JOIN性能对比

postgres=# explain select count(*) from tt t1 join tt t2 on t1.d64 * t1.d64 + t1.d64 = t2.d64 + t2.d64 * t2.d64;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Aggregate  (cost=6875071228.00..6875071228.01 rows=1 width=8)
   ->  Hash Join  (cost=36707.00..5625071228.00 rows=500000000000 width=0)
         Hash Cond: (((t1.d64 * t1.d64) + t1.d64) = (t2.d64 + (t2.d64 * t2.d64)))
         ->  Seq Scan on tt t1  (cost=0.00..20300.00 rows=1000000 width=8)
         ->  Hash  (cost=20300.00..20300.00 rows=1000000 width=8)
               ->  Seq Scan on tt t2  (cost=0.00..20300.00 rows=1000000 width=8)
(6 rows)  

Time: 0.508 ms
postgres=# select count(*) from tt t1 join tt t2 on t1.d64 * t1.d64 + t1.d64 = t2.d64 + t2.d64 * t2.d64;
  count
---------
 1000000
(1 row)  

Time: 1681.451 ms
postgres=# select count(*) from tt t1 join tt t2 on t1.n * t1.n + t1.n = t2.n + t2.n * t2.n;
  count
---------
 1000000
(1 row)  

Time: 2395.894 ms

嵌套循环性能对比

postgres=# explain select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.d * t1.d + t1.d > t2.d + t2.d * t2.d;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Aggregate  (cost=2699703.15..2699703.16 rows=1 width=8)
   ->  Nested Loop  (cost=0.00..2614087.74 rows=34246165 width=0)
         Join Filter: (((t1.d * t1.d) + t1.d) > (t2.d + (t2.d * t2.d)))
         ->  Seq Scan on tt t1  (cost=0.00..22800.00 rows=10136 width=8)
               Filter: (ii < 10000)
         ->  Materialize  (cost=0.00..22850.68 rows=10136 width=8)
               ->  Seq Scan on tt t2  (cost=0.00..22800.00 rows=10136 width=8)
                     Filter: (ii < 10000)
(8 rows)  

Time: 0.561 ms
postgres=# select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.d * t1.d + t1.d > t2.d + t2.d * t2.d;
  count
----------
 49985001
(1 row)  

Time: 19706.890 ms
postgres=# select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.n * t1.n + t1.n > t2.n + t2.n * t2.n;
  count
----------
 49985001
(1 row)  

Time: 70787.289 ms
postgres=# select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.d64 * t1.d64 + t1.d64 > t2.d64 + t2.d64 * t2.d64;
  count
----------
 49985001
(1 row)  

Time: 49861.689 ms
postgres=# select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.d128 * t1.d128 + t1.d128 > t2.d128 + t2.d128 * t2.d128;
  count
----------
 49985001
(1 row)  

Time: 65779.153 ms

小结

PostgreSQL内置的numeric类型属于"无限"精度数值类型，其他类型都是定长存储，使用时不需要调用palloc，效率较高。

如果你要使用超过双精能表示的有效范围的数值，目前只能选择decimal\numeric类型，而这个类型前面说了，由于是变长设计，需要调用palloc，效率一般。

那么在数据分析领域，或者需要处理非常多的数据记录时，numeric类型的开销是较大的。

从前面的测试数据，可以观察到性能最好的是float8，其次是decimal64, decimal64不需要使用palloc，性能比numeric好1.5倍左右，而decimal128也比numeric性能好不少。

期待将来PostgreSQL内置decimal64, decimal128。

参考

《PostgreSQL fixeddecimal - 用CPU "硬解码" 提升1倍数值运算能力助力金融大数据量计算》

https://www.postgresql.org/message-id/flat/CAFWGqnsuyOKdOwsNLVtDU1LLjS%3D66xmxxxS8Chnng_zSB5_uCg%40mail.gmail.com#CAFWGqnsuyOKdOwsNLVtDU1LLjS=66xmxxxS8Chnng_zSB5_uCg@mail.gmail.com

https://github.com/vitesse-ftian/pgdecimal

https://pgxn.org/dist/pgdecimal/1.0.0/

https://github.com/2ndQuadrant/fixeddecimal

时间： 2024-09-01 00:05:31

震精 - PostgreSQL decimal64 decimal128 高效率数值类型扩展的相关文章

震精 - PostgreSQL 递归查询 - 树型数据按路径分组输出

标签 PostgreSQL , 递归查询 , with recursive , 路径分组 , 路径递归背景 PostgreSQL的递归查询可以解决很多树形结构,路径查询等问题. 结合PostgreSQL plpgsql编程,可以实现更加复杂的问题,比如接下来这个CASE. 用户要求输出每个末端路径涉及的所有的记录. 例子创建测试表,插入路径数据. create table test(id serial, path text, info text); insert into test (pat

震精 - PostgreSQL 10.0 preview 性能增强 - WARM提升一倍性能

标签 PostgreSQL , 10.0 , WARM , 写放大 , 索引写放大背景目前,PostgreSQL的MVCC是多版本来实现的,当更新数据时,产生新的版本.(社区正在着手增加基于回滚段的存储引擎) 由于索引存储的是KEY+CTID(行号),当tuple的新版本与旧版本不在同一个数据块(BLOCK)的时候,索引也要随之变化,当新版本在同一个块里面时,则发生HOT UPDATE,索引的值不需要更新,但是因为产生了一条新的记录,所以也需要插入一条索引item,垃圾回收时,将其回收,因此

震精 - PostgreSQL 单机3.9 万亿/天(计数器、序列、自增)

标签 PostgreSQL , 计数器 , 序列 , 自增值背景数据库中,自增序列是常见的需求,例如计数,主键,唯一值,或者自动生成的流水号等等. 因此序列这个功能就应运而生,序列的功能在很多商业数据库中都支持需求,PostgreSQL当然也支持,而且更好用. 在PostgreSQL中可以创建多个序列,设置序列的起始值,步长,缓存大小,是否轮回等. postgres=# \h create sequence Command: CREATE SEQUENCE Description: defi

.Net判断一个对象是否为数值类型实例_实用技巧

本文实例讲述了.Net判断一个对象是否为数值类型的方法,分享给大家供大家参考.具体实现方法如下: .Net判断一个对象是否为数值类型乍一听是个很简单的事,但突然搞起来还真有点无从下手的感觉. 首先当然是通过GetType()方法反射获取其类型信息,然后对其进行分析,但是类型信息Type中并没有简单地给出这么一个属性进行判断. 老外给出的方法是: 复制代码代码如下: public static bool IsNumeric(this Type dataType) { if (da

《Python数据分析》一2.4 NumPy的数值类型

2.4 NumPy的数值类型 Python自身虽然支持整型.浮点型和复数型,但对于科学计算来说,还远远不够.现实中,我们仍然需要更多的数据类型,来满足在精度和存储大小方面的各种不同的要求.为此,NumPy提供了更加丰富的数据类型.注意,NumPy跟数学运算有关的数据类型的名称都以数字结尾.而这个数字指示了该类型的变量所占用的二进制位数.表2-1(改编自<NumPy用户指南>)概述了NumPy的各种数值类型. 每一种数据类型都有相应的转换函数(请参考本书代码包中的numericaltypes.p

jquery获取到复选框选中的值，值为数值类型，怎样通过ajax 传到后台,后台怎样接送

问题描述 jquery获取到复选框选中的值,值为数值类型,怎样通过ajax 传到后台,后台怎样接送求救,困扰了一天了, 急急急急急急急急急急急急,这个问题描述也太坑了吧. 解决方案 <!DOCTYPE html> <html> <head> <script src="/jquery/jquery-1.11.1.min.js"> </script> <script> function getCheck(){ var

关于SQL Server中将数值类型转换为字符串的问题

今天在把一些数据导入到SQL Server的时候遇到有个列被导入成float类型,而我实际需要的是varchar类型,所以要进行类型转换,转换时遇到了一点问题,所以写这篇博客记录一下. SQL Server中的数值类型分为两种,一种是精确的数值类型,具体的数据类型有:bit.tinyint.smallint.int.bigint.smallmoney.money和decimal,这些数据类型能够精确的表明某以数值:另一种是近似的数值类型,具体就是float和real.浮点数据为近似值,因此,并非

Go语言基础知识总结（语法、变量、数值类型、表达式、控制结构等）_Golang

一.语法结构 golang源码采用UTF-8编码.空格包括:空白,tab,换行,回车. - 标识符由字母和数字组成(外加'_'),字母和数字都是Unicode编码. - 注释: 复制代码代码如下: /* This is a comment; no nesting */ // So is this. 二.字面值(literals)类似C语言中的字面值,但数值不需要符号以及大小标志: 复制代码代码如下: 23 0x0FF 1.234e7类似C中的字符串,但字符串是Unicode/UTF-8编码的

cstring-MFC对话框串口编程中转化组合框数值类型

问题描述 MFC对话框串口编程中转化组合框数值类型实现在对话框类设置串口参数并连接串口,现在在校验位对应的组合框里添加了数据N/E/O,分别表示无校验,奇校验,偶校验 int index=m_ctrlParity.GetCurSel(); m_ctrlParity.GetLBText(index,m_strParity); 这里得到的数据类型m_strParity是CString型,但在打开串口时,用到的数据是char型,怎么转化得到呢?网上都是CString转Char _* 解决方案 CSt

震精 - PostgreSQL decimal64 decimal128 高效率数值 类型扩展

标签

背景