PostgreSQL 从文件系统恢复遗失的UNLOGGED table's datafile.

PostgreSQL的unlogged table是不记录xlog的，所以在备库上没有unlogged table的数据记录。
另一方面，当数据库异常crash时，数据库重启后需要从WAL恢复，由于unlogged table不记录WAL，为了保证数据的一致性，postgresql会自动清除unlogged table的数据文件。
那么问题来了，万一你不小心误创建了unlogged table，你可能一开始会没有感知，但是一旦发生以下情况，你会发现数据不见了。
1. 数据库crash掉之后，重启。
2. 主备切换，备库变成主库，主库变成备库。
以上两种情况，都是数据库在启动并进入recovery状态后，原来主库下的unlogged table的datafile都会被清空。
代码见：
src/backend/access/transam/xlog.c
/*
 * This must be called ONCE during postmaster or standalone-backend startup
 */
void
StartupXLOG(void)
{
......

        /* REDO */
        if (InRecovery)
        {
    ......
                /*
                 * We're in recovery, so unlogged relations may be trashed and must be
                 * reset.  This should be done BEFORE allowing Hot Standby
                 * connections, so that read-only backends don't try to read whatever
                 * garbage is left over from before.
                 */
                ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
......
        /*
         * Reset unlogged relations to the contents of their INIT fork. This is
         * done AFTER recovery is complete so as to include any unlogged relations
         * created during recovery, but BEFORE recovery is marked as having
         * completed successfully. Otherwise we'd not retry if any of the post
         * end-of-recovery steps fail.
         */
        if (InRecovery)
                ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
......

backend/storage/file/reinit.c
/*
 * Reset unlogged relations from before the last restart.
 *
 * If op includes UNLOGGED_RELATION_CLEANUP, we remove all forks of any
 * relation with an "init" fork, except for the "init" fork itself.
 *
 * If op includes UNLOGGED_RELATION_INIT, we copy the "init" fork to the main
 * fork.
 */
void
ResetUnloggedRelations(int op)
{
        char            temp_path[MAXPGPATH];
        DIR                *spc_dir;
        struct dirent *spc_de;
        MemoryContext tmpctx,
                                oldctx;

        /* Log it. */
        elog(DEBUG1, "resetting unlogged relations: cleanup %d init %d",
                 (op & UNLOGGED_RELATION_CLEANUP) != 0,
                 (op & UNLOGGED_RELATION_INIT) != 0);

        /*
         * Just to be sure we don't leak any memory, let's create a temporary
         * memory context for this operation.
         */
        tmpctx = AllocSetContextCreate(CurrentMemoryContext,
                                                                   "ResetUnloggedRelations",
                                                                   ALLOCSET_DEFAULT_MINSIZE,
                                                                   ALLOCSET_DEFAULT_INITSIZE,
                                                                   ALLOCSET_DEFAULT_MAXSIZE);
        oldctx = MemoryContextSwitchTo(tmpctx);

        /*
         * First process unlogged files in pg_default ($PGDATA/base)
         */
        ResetUnloggedRelationsInTablespaceDir("base", op);

        /*
         * Cycle through directories for all non-default tablespaces.
         */
        spc_dir = AllocateDir("pg_tblspc");

        while ((spc_de = ReadDir(spc_dir, "pg_tblspc")) != NULL)
        {
                if (strcmp(spc_de->d_name, ".") == 0 ||
                        strcmp(spc_de->d_name, "..") == 0)
                        continue;

                snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s",
                                 spc_de->d_name, TABLESPACE_VERSION_DIRECTORY);
                ResetUnloggedRelationsInTablespaceDir(temp_path, op);
        }
        FreeDir(spc_dir);

        /*
         * Restore memory context.
         */
        MemoryContextSwitchTo(oldctx);
        MemoryContextDelete(tmpctx);
}
src/include/common/relpath.h
/*
 * Stuff for fork names.
 *
 * The physical storage of a relation consists of one or more forks.
 * The main fork is always created, but in addition to that there can be
 * additional forks for storing various metadata. ForkNumber is used when
 * we need to refer to a specific fork in a relation.
 */
typedef enum ForkNumber
{
        InvalidForkNumber = -1,
        MAIN_FORKNUM = 0,
        FSM_FORKNUM,
        VISIBILITYMAP_FORKNUM,
        INIT_FORKNUM

        /*
         * NOTE: if you add a new fork, change MAX_FORKNUM and possibly
         * FORKNAMECHARS below, and update the forkNames array in
         * src/common/relpath.c
         */
} ForkNumber;

那么问题来了，如果真的这样了，有办法恢复吗？
1. 首先，如果你在原来的主库上有基础备份，你可以从基础备份恢复。为什么需要主库的备份呢，因为备库上没有unlogged table的数据文件内容，所以在备库备份是备不到unlogged table的datafile的。
但是这种恢复方法也务必要小心，你需要在启动数据库前，先把_init的文件都删掉，这样启动数据库时数据文件就不会被清除。
2. 从主库的逻辑备份中恢复。
3. 从审计日志中回放SQL恢复。
4. 如果你没有主库的基础备份，那么可有从文件系统中去恢复删掉的数据文件。例如ext4文件系统的恢复方式如下：
http://blog.163.com/digoal@126/blog/static/16387704020142124032866/
例子：
创建一个unlogged table，并记录它的filenode
postgres=# create unlogged table utbl1(id int);
CREATE TABLE
postgres=# insert into utbl1 select generate_series(1,1000);
INSERT 0 1000
postgres=# select pg_relation_filepath('utbl1'::regclass);
 pg_relation_filepath
----------------------
 base/151898/33822
(1 row)
查看到PG对unlogged table做了_init的后缀标记
http://www.postgresql.org/docs/9.4/static/storage-init.html
postgres@digoal-> cd $PGDATA
postgres@digoal-> ll base/151898/33822*
-rw------- 1 postgres postgres 40K Sep 26 11:39 base/151898/33822
-rw------- 1 postgres postgres 24K Sep 26 11:39 base/151898/33822_fsm
-rw------- 1 postgres postgres   0 Sep 26 11:38 base/151898/33822_init
现在，我们进入恢复模式
postgres@digoal-> mv recovery.done recovery.conf
postgres@digoal-> pg_ctl start
数据库启动后，其实已经将unlogged table的数据文件清理掉了，见前面的源码。
postgres=# select count(*) from utbl1 ;
ERROR:  cannot access temporary or unlogged relations during recovery
现在再起来，unlogged table就没有数据了。
postgres@digoal-> mv recovery.conf recovery.done
postgres@digoal-> pg_ctl start
postgres=# select count(*) from utbl1 ;
 count
-------
     0
(1 row)

恢复过程，先停库，然后umount数据文件所在的文件系统。
pg_ctl stop -m fast
root@digoal-> umount /data01
假设我已经安装了extundelete
http://sourceforge.net/projects/extundelete/files/extundelete/
http://blog.163.com/digoal@126/blog/static/16387704020142124032866/
root@digoal-> cd /opt/extundelete/
root@digoal-> ll
total 4.0K
drwxr-xr-x 2 root root 4.0K Sep 26 13:43 bin
root@digoal-> cd bin/
root@digoal-> ll
total 1.2M
-rwxr-xr-x 1 root root 1.2M Sep 26 13:43 extundelete
查看数据文件所在的文件系统的块设备的inode信息。
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01
./extundelete: unrecognized option '--ls'
NOTICE: Extended attributes are not restored.
Loading filesystem metadata ... 80 groups loaded.
Group: 0
Contents of inode 2:
0000 | ed 41 00 00 00 10 00 00 6d 08 fa 55 bc 08 fa 55 | .A......m..U...U
0010 | bc 08 fa 55 00 00 00 00 00 00 05 00 08 00 00 00 | ...U............
0020 | 00 00 00 00 02 00 00 00 a1 22 00 00 00 00 00 00 | ........."......
0030 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
0040 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
0050 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
0060 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
0070 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
0080 | 1c 00 00 00 88 90 cc 84 88 90 cc 84 00 00 00 00 | ................
0090 | 6d 08 fa 55 00 00 00 00 00 00 00 00 00 00 00 00 | m..U............
00a0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
00b0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
00c0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
00d0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
00e0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
00f0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................

Inode is Allocated
File mode: 16877
Low 16 bits of Owner Uid: 0
Size in bytes: 4096
Access time: 1442449517
Creation time: 1442449596
Modification time: 1442449596
Deletion Time: 0
Low 16 bits of Group Id: 0
Links count: 5
Blocks count: 8
File flags: 0
File version (for NFS): 0
File ACL: 0
Directory ACL: 0
Fragment address: 0
Direct blocks: 8865, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
Indirect block: 0
Double indirect block: 0
Triple indirect block: 0

File name                                       | Inode number | Deleted status
.                                                 2
..                                                2
lost+found                                        11
pg_root_1921                                      131073
pg_root_1922                                      393217
我们的数据文件在pg_root_1921目录下，对应的inode=131073，根据这个inode继续查询下一级目录的inode
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131073
找到了base的inode
base                                              131077
继续找，找到了unlogged table所在的database的inode
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131077
151898
                                    131078
继续找，找对应的_init文件，找到了它的前缀，根据前缀过滤
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131078|grep _init
33822_init                                        131152
找到了被删除的文件
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131078|grep 33822
./extundelete: unrecognized option '--ls'
33822                                             131116         Deleted
33822_fsm                                         131147         Deleted
33822_init                                        131152
使用inode恢复，恢复删除的datafile
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --restore-inode 131116
恢复后放在这里
root@digoal->  ll RECOVERED_FILES/
total 40K
-rw-r--r-- 1 root root 40K Sep 26 13:50 file.131116
将文件拷贝到原来的位置
root@digoal-> mount /data01
root@digoal-> cp file.131116 /data01/pg_root_1921/base/151898/33822
root@digoal-> chown -R postgres:postgres /data01/
root@digoal-> su - postgres
启动数据库前，切记，删除_init后缀的文件，否则数据文件可能又会被清理掉。
cd $PGDATA
postgres@digoal-> rm -f base/151898/33822_init
启动数据库，数据已经回来了。
postgres@digoal-> pg_ctl start
postgres=# select count(*) from utbl1 ;
 count
-------
  1000
(1 row)

大家千万要切记，不要滥用unlogged table，除非你可以为你的行为负责。
hash index也是这样，因为不写XLOG，所以如果你使用流复制来实施HA的话，在切换到备库后，如果走hash index scan，你会发现数据凭空"消失"，实际上是因为hash index的变更没有复制过去的原因。所以就不要使用hash index了吧。

[参考]
1. http://sourceforge.net/projects/extundelete/files/extundelete/
2. http://blog.163.com/digoal@126/blog/static/16387704020142124032866/

时间： 2024-09-20 00:44:58

PostgreSQL 从文件系统恢复遗失的UNLOGGED table's datafile.的相关文章

HTAP数据库 PostgreSQL 场景与性能测试之 42 - (OLTP+OLAP) unlogged table 不含索引多表批量写入

标签 PostgreSQL , HTAP , OLTP , OLAP , 场景与性能测试背景 PostgreSQL是一个历史悠久的数据库,历史可以追溯到1973年,最早由2014计算机图灵奖得主,关系数据库的鼻祖Michael_Stonebraker 操刀设计,PostgreSQL具备与Oracle类似的功能.性能.架构以及稳定性. PostgreSQL社区的贡献者众多,来自全球各个行业,历经数年,PostgreSQL 每年发布一个大版本,以持久的生命力和稳定性著称. 2017年10月,Pos

PostgreSQL 为什么不要滥用unlogged table & hash index

unlogged table和hash index同样都不会写XLOG,所以如果你用流复制来搞HA,一定概要搞清楚一个问题,切换到备库的话unlogged table数据会被清掉,而hash index也没有,走hash index会失败. unlogged table 的风险以及修复手段可以见 : http://blog.163.com/digoal@126/blog/static/163877040201582621345351/ hash index则风险略小,但是也必须重建,但是个人还是

ext3文件系统恢复被删文件

大家都知道,在linux系统中是没有回收站的概念的,一旦rm命令删除某个文件之后,就找不回来.不过其实这时还是有救的,之前大概清楚个概念,知道有救,但如何救就没怎么详细去了解了.那么这次我们来实际操作下. 其实为什么说还有救呢?说这个之前需要对ext3文件系统有个大概的了解. 在ext3文件系统中我们创建一个文件时,它首先会在ext3文件系统的inode表申请个inode号,然后再将文件的信息以及数据写入.那么inode号又是什么呢? inode号里面包括两部分,一个是metadata,即是元数

Postgresql转存恢复数据经验谈

我们知道恢复数据是dba工作的一部分,做这部分工作一般都是输入恢复命令后就在那里无聊的等着任务的完,特别是大容量的数据恢复起来真的等到晕,那么我们如何提高数据恢复的性能呢,下面说说那些参数有影响我们知道,导入数据的过程其实就是io的负载,特别考验硬盘的写能力,还有就是数据写入磁盘的方法,另外一个就是cpu的处理能力,再一个就是内存的大量使用,影响的参数大概有这些 shared_buffers work_mem maintenance_work_mem checkpoint_segments a

PostgreSQL Oracle 兼容性之 - PL/SQL record, table类型定义

背景 Oracle PL/SQL是非常强大的一门SQL编程语言,许多Oracle用户也使用它来处理一些要求延迟低且数据一致性或可靠性要求很高的业务逻辑. PostgreSQL也有一门非常高级的内置SQL编程语言,plpgsql.与Oracle PL/SQL语法极其类似,但是还是有一些不一样的地方.(PS:除了plpgsql,PostgreSQL还支持C,java,python,perl等流行的语言作为数据库的函数编程语言) 本文是针对有Oracle用户遇到的一些函数语法与PostgreSQL不兼

Postgresql数据库的恢复(Console)

1. 将Postgresql的bin目录加入你环境变量的path路径 2. psql -U postgres -c "CREATE DATABASE postgis_sample OWNER postgres TEMPLATE template_postgis ENCODING 'UTF8';" postgis_sample是你的数据库名字 3. psql -d postgis_sample -U postgres -f D:\PostDbBak\SharpMap\

PostgreSQL on ECS多云盘的部署、快照备份和恢复

标签 PostgreSQL , ECS , 云盘 , 快照 , 一致性备份 , 时间点恢复 , zfs , lvm , raid , 并行计算背景随着阿里云云盘的发展,云盘的性能已经越来越好了.IOPS可以做到十万以上,读写吞吐也超过1GB/s了.相信随着网络的发展,SSD云盘IOPS突破40万,读写吞吐突破4GB/s也不远了. 不过这里的IOPS和吞吐是指并发的IO能力,单次IO的延迟与本地还是不能比(将来RDMA网络也许能解决这个问题). PostgreSQL 如何解决SSD云盘IO延迟

Linux网络文件系统的数据备份、恢复及同步机制

本文将详细介绍针对该网络文件系统的数据备份.恢复及同步机制在内核的具体实现,给广大系统管理员和研发人员提供技术参考.网络文件系统(NFS)协议是由 Sun MicroSystem 公司在 20 世纪 80 年代为了提供对共享文件的远程访问而设计和实现的,它采用了经典的客户机/服务器模式提供服务.为了达到如同 NFS 协议通过使用 Sun 公司开发的远在本机上使用本地文件系统一样便捷的效果,NFS 通过使用远程过程调用协议(RPC Protocol)来实现运行在一台计算机上的程序来调用在另一台远程

一天学会PostgreSQL应用开发与管理 - 8 PostgreSQL 管理

本章大纲一.权限体系 1 逻辑结构 2 权限体系 3 schema使用 , 特别注意 4 用户 5 public 6 如何查看和解读一个对象的当前权限状态二.索引介绍 1 索引有什么用? 2 索引的类型 3 索引合并扫描 4 表膨胀检查 5 检查膨胀 6 索引维护三.系统配置 1 存储.文件系统规划 2 网络规划 3 CPU评估 4 内核配置 5 资源限制 6 防火墙配置四.数据库初始化 1 initdb 介绍 2 postgresql.conf参数配置 3 pg_hba.conf数据库