PostgreSQL Oracle 兼容性之 - ASCIISTR

背景

在Oracle中有一个函数asciistr，可以将非ASCII字符转换成UTF-16编码的字符，因此转换后的字符串可以存储在只支持ASCII编码的数据库中。

ASCIISTR takes as its argument a string, or an expression that resolves to a string, in any character set and returns an ASCII version of the string in the database character set.

Non-ASCII characters are converted to the form \xxxx, where xxxx represents a UTF-16 code unit.

例子

SELECT ASCIISTR('ABÄCDE') FROM DUAL;

ASCIISTR('
----------
AB\00C4CDE

\00C4即转换后的UTF-16编码，这样整个字符串就可以存储在ASCII编码的数据库中了。

PostgreSQL ASCIISTR

了解了ASCIISTR的目的之后，我们就可以在PostgreSQL中实现对非ASCII编码的字符转换了。

PostgreSQL数据库没有UTF-16字符集，但是它支持UTF-8，UTF-8是变长字符集，支持全球所有的语言字符编码。

PostgreSQL提供了几个转换函数如下

postgres=# \df convert*
                              List of functions
   Schema   |     Name     | Result data type | Argument data types |  Type
------------+--------------+------------------+---------------------+--------
 pg_catalog | convert      | bytea            | bytea, name, name   | normal
 pg_catalog | convert_from | text             | bytea, name         | normal
 pg_catalog | convert_to   | bytea            | text, name          | normal
(3 rows)

例子，支持来回转换

postgres=# select convert_to('abc你好中国cde','UTF8');
               convert_to
----------------------------------------
 \x616263e4bda0e5a5bde4b8ade59bbd636465
(1 row)

postgres=# select convert_from(convert_to('abc你好中国cde','UTF8'),'UTF8');
  convert_from
----------------
 abc你好中国cde
(1 row)

postgres=# select convert_from('\x616263e4bda0e5a5bde4b8ade59bbd636465', 'UTF8');
  convert_from
----------------
 abc你好中国cde
(1 row)

方法2，使用textsend函数，这个函数没有编码输入，所以需要设置client_encoding。

postgres=# set client_encoding='UTF8';
SET
postgres=# select textsend('abc你好中国cde');
                textsend
----------------------------------------
 \x616263e4bda0e5a5bde4b8ade59bbd636465
(1 row)

postgres=# set client_encoding='GBK';
SET
postgres=# select textsend('abc你好中国cde');
ERROR:  character with byte sequence 0xad 0xe5 in encoding "GBK" has no equivalent in encoding "UTF8"
LOCATION:  report_untranslatable_char, wchar.c:2051
postgres=# select textsend('abc你好cde');
          textsend
----------------------------
 \x616263e4bda0e5a5bd636465
(1 row)

PostgreSQL支持的字符编码转换函数在这里

postgres=# select * from pg_conversion ;
            conname             | connamespace | conowner | conforencoding | contoencoding |            conproc             | condefault
--------------------------------+--------------+----------+----------------+---------------+--------------------------------+------------
 ascii_to_mic                   |           11 |       10 |              0 |             7 | ascii_to_mic                   | t
 mic_to_ascii                   |           11 |       10 |              7 |             0 | mic_to_ascii                   | t
......
 utf8_to_shift_jis_2004         |           11 |       10 |              6 |            41 | utf8_to_shift_jis_2004         | t
 euc_jis_2004_to_shift_jis_2004 |           11 |       10 |              5 |            41 | euc_jis_2004_to_shift_jis_2004 | t
 shift_jis_2004_to_euc_jis_2004 |           11 |       10 |             41 |             5 | shift_jis_2004_to_euc_jis_2004 | t
(132 rows)

以上例子把整个字符串都进行了转换，包括一些已经是ASCII的字符也被转换了，转换后长度变长了，如果你要尽量缩小长度怎么办呢？

如果你要对一个字符串中非ASCII字符转换为bytea，而ASCII字符保持不变，则需要自定义PostgreSQL函数，如下demo

create or replace function asciistr(str text, encoding text) returns text[] as $$
declare
  mid text;
  res text[] := (array[])::text[];
begin
  foreach mid in array regexp_split_to_array(str, '')
  loop
    if ascii(mid)<256 then
      res := array_append(res, mid);
    else
      res := array_append(res, (convert_to(mid,encoding))::text);
    end if;
  end loop;
  return res;
end;
$$
language plpgsql strict;

测试

postgres=# select asciistr('abc中国你好ced', 'UTF8');
                           asciistr
---------------------------------------------------------------
 {a,b,c,"\\xe4b8ad","\\xe59bbd","\\xe4bda0","\\xe5a5bd",c,e,d}
(1 row)

你还可以输出字符串，也可以直接输出字符串

postgres=# select array_to_string(asciistr('abc中国你好ced', 'UTF8'), '');
            array_to_string
----------------------------------------
 abc\xe4b8ad\xe59bbd\xe4bda0\xe5a5bdced
(1 row)

使用这种方法，在反转换时，也要通过函数来处理。

create or replace function reverse_asciistr(str text[], encoding text) returns text as $$
declare
  mid text;
  res text := '';
begin
  foreach mid in array str
  loop
    if mid ~ '^\\x' then
      res := concat(res, convert_from(mid::bytea, encoding));
    else
      res := concat(res, mid);
    end if;
  end loop;
  return res;
end;
$$
language plpgsql strict;

反转转换举例

postgres=# select reverse_asciistr(asciistr('abc_12\ab你好\ade中国_1jjr3', 'UTF8'), 'UTF8');
      reverse_asciistr
-----------------------------
 abc_12\ab你好\ade中国_1jjr3
(1 row)

参考

https://www.postgresql.org/docs/9.6/static/multibyte.html

Count

时间： 2024-11-05 12:23:48

PostgreSQL Oracle 兼容性之 - ASCIISTR的相关文章

PostgreSQL Oracle兼容性之 - plpgsql 自治事务(autonomous_transaction)补丁

PostgreSQL Oracle兼容性之 - plpgsql 自治事务(autonomous_transaction)补丁作者 digoal 日期 2016-11-04 标签 PostgreSQL , autonomous_transaction , 自治事务 , Oracle兼容性 , plpgsql 背景 PostgreSQL的plpgsql服务端编程语言与Oracle数据库的pl/sql编程语言非常类似,但是对于自治事务一直没有语法层面的支持. 以往如果要支持自治事务,可以使用exce

PostgreSQL Oracle 兼容性之 - rowid (CREATE TABLE WITH OIDS)

标签 PostgreSQL , Oracle , 兼容性 , 行号 , rowid , oid , ctid 背景 Oracle的数据中,通过ROWID可以定位到一条记录,当记录没有发生行迁移时,ROWID是不变的,因此即使不使用PK,也能很好的定位到一条记录. PostgreSQL中,也有行号,CTID,由BLOCK_ID和ITEM_ID组成,即哪个数据块的哪条记录. 但是PostgreSQL的引擎为多版本引擎,因此一条记录在被更新后CTID会发生变化(代表了新的版本). 不管是Oracle还

PostgreSQL Oracle兼容性之 - parser SQL保留|关键字(keywrods)大全

标签 PostgreSQL , keywords , 关键字 , Oracle 兼容性背景 PostgreSQL数据库中有哪些关键字,这些关键字的使用限制如何? https://www.postgresql.org/docs/10/static/sql-keywords-appendix.html 文档中的说明并不是特别清晰,(并且KEYWORDS与版本强相关),所以使用pg_get_keywords这个系统函数得到的,更加准确可读. https://www.postgresql.org/do

PostgreSQL Oracle 兼容性之 - timestamp 与 numeric 的运算

标签 PostgreSQL , Oracle , 兼容性 , timestamp , numeric 背景 Oracle里面支持时间戳与数字的加减,数字默认单位为天. PostgreSQL 支持时间戳与interval类型进行加减.日期支持与整型做加减. 为了兼容Oracle(时间戳与数字加减),我们可以复写操作符来实现时间戳与数字的加减. 复写操作符 1.自定义几个函数,用于时间戳与数字的加减. postgres=# create or replace function timestamp_a

PostgreSQL Oracle 兼容性之 TABLE、PIPELINED函数

标签 PostgreSQL , 返回表 , 返回复合类型 , 返回游标背景 Oracle 通过table, pipelined函数,用于格式化返回类型为table的函数的结果. Table function concepts There a couple of steps to take when you are working with table functions. Like when you are working with normal tables you have to desc

PostgreSQL Oracle 兼容性之 - 系统列(ctid, oid, cmin, cmax, xmin, xmax)

标签 PostgreSQL , Oracle , 兼容性 , ctid , cmin , cmax , xmin , xmax , oid 背景 PostgreSQL中有一些系统列(即行的头部信息的列),例如物理行号,COMMAND ID,事务号,以及OID. 当我们建表时,不能使用冲突的列名,否则会报错: postgres=# create table a(ctid int); 错误: 42701: 字段名 "ctid" 与系统字段名冲突 LOCATION: CheckAttribu

PostgreSQL Oracle 兼容性之 - PL/SQL record, table类型定义

背景 Oracle PL/SQL是非常强大的一门SQL编程语言,许多Oracle用户也使用它来处理一些要求延迟低且数据一致性或可靠性要求很高的业务逻辑. PostgreSQL也有一门非常高级的内置SQL编程语言,plpgsql.与Oracle PL/SQL语法极其类似,但是还是有一些不一样的地方.(PS:除了plpgsql,PostgreSQL还支持C,java,python,perl等流行的语言作为数据库的函数编程语言) 本文是针对有Oracle用户遇到的一些函数语法与PostgreSQL不兼

PostgreSQL Oracle 兼容性之 USERENV

标签 PostgreSQL , Oracle , USERENV , 会话环境变量背景 USERENV 是Oracle 用来获取当前会话变量的函数.官方是这么介绍的: https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions184.htm#SQLRF06117 Describes the current session. The predefined parameters of namespace USERENV are

PostgreSQL Oracle兼容性 - 计算字符长度与字节长度(char(?) 与varchar(?)空格如何计算长度)

标签 PostgreSQL , Oracle , 字符长度 , 字节长度 , 空格 , varchar , char , 定长 , 变长 , 末尾追加空格背景由于多字节字符的存在,所以在数据库应用中,通常会出现两种计算字符串长度的需求: 1.计算字符串个数 2.计算字节数在不同的数据库中,使用的函数不一样. 如何计算字符和字节个数 https://stackoverflow.com/questions/17062065/how-to-select-data-items-of-a-certa