PostgreSQL 最佳实践 - 任意时间点恢复源码分析

背景

我们知道PostgreSQL是支持任意时间点恢复的,那么背后的原理是什么?

本文将对PG的时间点恢复进行详细的讲解,帮助用户理解。

本文涉及源码参考PostgreSQL 9.2.2版本.

时间点恢复涉及的参数

我们知道PostgreSQL 支持PITR, 基于时间点的恢复. 通过配置recovery.conf可以指定3种恢复目标, 如下 :

recovery_target_name (string)
This parameter specifies the named restore point, created with pg_create_restore_point() to which recovery will proceed.
At most one of recovery_target_name, recovery_target_time or recovery_target_xid can be specified. The default is to recover to the end of the WAL log.  

recovery_target_time (timestamp)
This parameter specifies the time stamp up to which recovery will proceed.
At most one of recovery_target_time, recovery_target_name or recovery_target_xid can be specified. The default is to recover to the end of the WAL log.
The precise stopping point is also influenced by recovery_target_inclusive.  

recovery_target_xid (string)
This parameter specifies the transaction ID up to which recovery will proceed. Keep in mind that while transaction IDs are assigned sequentially at transaction start, transactions can complete in a different numeric order. The transactions that will be recovered are those that committed before (and optionally including) the specified one.
At most one of recovery_target_xid, recovery_target_name or recovery_target_time can be specified. The default is to recover to the end of the WAL log.
The precise stopping point is also influenced by recovery_target_inclusive.

其中recovery_target_time和recovery_target_xid可以指定recovery_target_inclusive参数, 如下 :

recovery_target_inclusive (boolean)
Specifies whether we stop just after the specified recovery target (true), or just before the recovery target (false).
Applies to both recovery_target_time and recovery_target_xid, whichever one is specified for this recovery.
This indicates whether transactions having exactly the target commit time or ID, respectively, will be included in the recovery.
Default is true.

默认为true取自src/backend/access/transam/xlog.c :

static bool recoveryTargetInclusive = true;

为什么recovery_target_name不能指定recovery_target_inclusive参数?

而recovery_target_time和recovery_target_xid可以指定recovery_target_inclusive参数呢?

恢复截至点源码分析

首先要解释一下, 什么情况下恢复可以截止.

只在三种情况恢复可以截止 :

 COMMIT/ABORT/XLOG_RESTORE_POINT,

然后这些信息从哪里来呢? 它们都取自XLOG的头数据XLogRecord中的sl_rmid和xl_info :

src/include/access/xlog.h

/*
 * The overall layout of an XLOG record is:
 *              Fixed-size header (XLogRecord struct)
 *              rmgr-specific data
 *              BkpBlock
 *              backup block data
 *              BkpBlock
 *              backup block data
 *              ...
 *
 * where there can be zero to four backup blocks (as signaled by xl_info flag
 * bits).  XLogRecord structs always start on MAXALIGN boundaries in the WAL
 * files, and we round up SizeOfXLogRecord so that the rmgr data is also
 * guaranteed to begin on a MAXALIGN boundary.  However, no padding is added
 * to align BkpBlock structs or backup block data.
 *
 * NOTE: xl_len counts only the rmgr data, not the XLogRecord header,
 * and also not any backup blocks.      xl_tot_len counts everything.  Neither
 * length field is rounded up to an alignment boundary.
 */
typedef struct XLogRecord
{
        pg_crc32        xl_crc;                 / CRC for this record /
        XLogRecPtr      xl_prev;                / ptr to previous record in log /
        TransactionId xl_xid;           / xact id /
        uint32          xl_tot_len;             / total len of entire record /
        uint32          xl_len;                 / total len of rmgr data /
        uint8           xl_info;                / flag bits, see below /
        RmgrId          xl_rmid;                / resource manager for this record /  

        / Depending on MAXALIGN, there are either 2 or 6 wasted bytes here /  

        / ACTUAL LOG DATA FOLLOWS AT END OF STRUCT /  

} XLogRecord;

只有在这三个状态下, 恢复允许进入截止判断.

COMMIT/ABORT/XLOG_RESTORE_POINT;

这个逻辑来自recoveryStopsHere函数 :

恢复截止的处理函数recoveryStopsHere中包含了这三个状态的判断, 如下 :

src/backend/access/transam/xlog.c

        / We only consider stopping at COMMIT, ABORT or RESTORE POINT records /
        if (record->xl_rmid != RM_XACT_ID && record->xl_rmid != RM_XLOG_ID)
                return false;
        record_info = record->xl_info & ~XLR_INFO_MASK;
        if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_COMMIT_COMPACT)
        {
                xl_xact_commit_compact *recordXactCommitData;  

                recordXactCommitData = (xl_xact_commit_compact *) XLogRecGetData(record);
                recordXtime = recordXactCommitData->xact_time;
        }
        else if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_COMMIT)
        {
                xl_xact_commit *recordXactCommitData;  

                recordXactCommitData = (xl_xact_commit *) XLogRecGetData(record);
                recordXtime = recordXactCommitData->xact_time;
        }
        else if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_ABORT)
        {
                xl_xact_abort *recordXactAbortData;  

                recordXactAbortData = (xl_xact_abort *) XLogRecGetData(record);
                recordXtime = recordXactAbortData->xact_time;
        }
        else if (record->xl_rmid == RM_XLOG_ID && record_info == XLOG_RESTORE_POINT)
        {
                xl_restore_point *recordRestorePointData;  

                recordRestorePointData = (xl_restore_point *) XLogRecGetData(record);
                recordXtime = recordRestorePointData->rp_time;
                strncpy(recordRPName, recordRestorePointData->rp_name, MAXFNAMELEN);
        }
        else
                return false;

COMMIT和ABORT很好理解, 就是事务结束时状态, RESOTRE POINT的信息则来自XLogRestorePoint函数,

src/backend/access/transam/xlog.c

/*
 * Write a RESTORE POINT record
 */
XLogRecPtr
XLogRestorePoint(const char *rpName)
{
        XLogRecPtr      RecPtr;
        XLogRecData rdata;
        xl_restore_point xlrec;  

        xlrec.rp_time = GetCurrentTimestamp();
        strncpy(xlrec.rp_name, rpName, MAXFNAMELEN);  

        rdata.buffer = InvalidBuffer;
        rdata.data = (char *) &xlrec;
        rdata.len = sizeof(xl_restore_point);
        rdata.next = NULL;  

        RecPtr = XLogInsert(RM_XLOG_ID, XLOG_RESTORE_POINT, &rdata);  

        ereport(LOG,
                        (errmsg("restore point \"%s\" created at %X/%X",
                                        rpName, RecPtr.xlogid, RecPtr.xrecoff)));  

        return RecPtr;
}

什么是自定义还原点

在使用PostgreSQL内建的pg_create_restore_point函数创建还原点时用到XLogRestorePoint :

src/backend/access/transam/xlogfuncs.c

/*
 * pg_create_restore_point: a named point for restore
 */
Datum
pg_create_restore_point(PG_FUNCTION_ARGS)
{
        text       *restore_name = PG_GETARG_TEXT_P(0);
        char       *restore_name_str;
        XLogRecPtr      restorepoint;
        char            location[MAXFNAMELEN];  

        if (!superuser())
                ereport(ERROR,
                                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
                                 (errmsg("must be superuser to create a restore point"))));  

        if (RecoveryInProgress())
                ereport(ERROR,
                                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                                 (errmsg("recovery is in progress"),
                                  errhint("WAL control functions cannot be executed during recovery."))));  

        if (!XLogIsNeeded())
                ereport(ERROR,
                                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                         errmsg("WAL level not sufficient for creating a restore point"),
                                 errhint("wal_level must be set to \"archive\" or \"hot_standby\" at server start.")));  

        restore_name_str = text_to_cstring(restore_name);  

        if (strlen(restore_name_str) >= MAXFNAMELEN)
                ereport(ERROR,
                                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                                 errmsg("value too long for restore point (maximum %d characters)", MAXFNAMELEN - 1)));  

        restorepoint = XLogRestorePoint(restore_name_str);  

        /*
         * As a convenience, return the WAL location of the restore point record
         */
        snprintf(location, sizeof(location), "%X/%X",
                         restorepoint.xlogid, restorepoint.xrecoff);
        PG_RETURN_TEXT_P(cstring_to_text(location));
}

经过以上介绍以后, 我们知道recoveryStopsHere开头部分的逻辑决定了PITR恢复可以选择截止在:

1. 事务结束时(COMMIT/ABORT);

2. 或者是用户使用pg_create_restore_point创建的还原点;

recoveryStopsHere接下来的部分针对recovery.conf中的配置, 判断是否截止恢复.

截至点的用法解说

在文章开头我们还提到了3个还原目标(target) :

(recovery_target_xid, recovery_target_time, recovery_target_name)

1. 未设置任何截至目标, 只返回false, 所以不会停止

        / Do we have a PITR target at all? /
        if (recoveryTarget == RECOVERY_TARGET_UNSET)
        {
                /*
                 * Save timestamp of latest transaction commit/abort if this is a
                 * transaction record
                 */
                if (record->xl_rmid == RM_XACT_ID)
                        SetLatestXTime(recordXtime);
                return false;
        }

RECOVERY_TARGET_UNSET 取自

src/include/access/xlog.h

/*
 * Recovery target type.
 * Only set during a Point in Time recovery, not when standby_mode = on
 */
typedef enum
{
        RECOVERY_TARGET_UNSET,
        RECOVERY_TARGET_XID,
        RECOVERY_TARGET_TIME,
        RECOVERY_TARGET_NAME
} RecoveryTargetType;

2. recovery_target_xid 与 XLogRecord->xl_xid进行比较.
xid作为恢复目标时, recoveryTargetInclusive只影响日志输出(recoveryStopAfter).

原因是xid是按事务启动顺序分配的, 而不是按事务结束顺序分配. 并且这种target下面截止只可能在COMMIT/ABORT.

所以只要达到这个xid并且状态是commit/abort时, 就返回true.

*includeThis = recoveryTargetInclusive;只影响了日志输出. 而不是包含和不包含的意思.

        if (recoveryTarget == RECOVERY_TARGET_XID)
        {
                /*
                 * There can be only one transaction end record with this exact
                 * transactionid
                 *
                 * when testing for an xid, we MUST test for equality only, since
                 * transactions are numbered in the order they start, not the order
                 * they complete. A higher numbered xid will complete before you about
                 * 50% of the time...
                 */
                stopsHere = (record->xl_xid == recoveryTargetXid);
                if (stopsHere)
                        *includeThis = recoveryTargetInclusive;
        }

日志输出时, 判断recoveryStopAfter :

        if (stopsHere)
        {
                recoveryStopXid = record->xl_xid;
                recoveryStopTime = recordXtime;
                recoveryStopAfter = *includeThis;  

                if (record_info == XLOG_XACT_COMMIT_COMPACT || record_info == XLOG_XACT_COMMIT)
                {
                        if (recoveryStopAfter)
                                ereport(LOG,
                                                (errmsg("recovery stopping after commit of transaction %u, time %s",
                                                                recoveryStopXid,
                                                                timestamptz_to_str(recoveryStopTime))));
                        else
                                ereport(LOG,
                                                (errmsg("recovery stopping before commit of transaction %u, time %s",
                                                                recoveryStopXid,
                                                                timestamptz_to_str(recoveryStopTime))));
                }
                else if (record_info == XLOG_XACT_ABORT)
                {
                        if (recoveryStopAfter)
                                ereport(LOG,
                                                (errmsg("recovery stopping after abort of transaction %u, time %s",
                                                                recoveryStopXid,
                                                                timestamptz_to_str(recoveryStopTime))));
                        else
                                ereport(LOG,
                                                (errmsg("recovery stopping before abort of transaction %u, time %s",
                                                                recoveryStopXid,
                                                                timestamptz_to_str(recoveryStopTime))));
                }

3. recovery_target_name 与 XLogRecData->data进行比较.

如果数据库中有多个重复命名的还原点, 遇到第一个则停止.

同时因为还原点的信息写在单独的xlog数据块中, 不是一条transaction record块, 所以也没有包含或不包含的概念, 直接截止.

不需要判断recovery_target_inclusive .

        else if (recoveryTarget == RECOVERY_TARGET_NAME)
        {
                /*
                 * There can be many restore points that share the same name, so we
                 * stop at the first one
                 */
                stopsHere = (strcmp(recordRPName, recoveryTargetName) == 0);  

                /*
                 * Ignore recoveryTargetInclusive because this is not a transaction
                 * record
                 */
                *includeThis = false;
        }

4. recovery_target_time 与 xl_xact_commit_compact->xact_time进行比较.

因为在同一个时间点, 可能有多个事务COMMIT/ABORT. 所以recovery_target_inclusive 在这里起到的作用是 :

截止于这个时间点的第一个提交的事务后(包含这个时间点第一个遇到的提交/回滚的事务);

或者截止于这个时间点提交的最后一个事务后(包括这个时间点提交/回滚的所有事务).

        else
        {
                /*
                 * There can be many transactions that share the same commit time, so
                 * we stop after the last one, if we are inclusive, or stop at the
                 * first one if we are exclusive
                 */
                if (recoveryTargetInclusive)
                        stopsHere = (recordXtime > recoveryTargetTime);
                else
                        stopsHere = (recordXtime >= recoveryTargetTime);
                if (stopsHere)
                        *includeThis = false;
        }

其中事务结束时间来自这个数据结构 :

src/include/access/xact.h

typedef struct xl_xact_commit_compact
{
        TimestampTz xact_time;          / time of commit /
        int                     nsubxacts;              / number of subtransaction XIDs /
        / ARRAY OF COMMITTED SUBTRANSACTION XIDs FOLLOWS /
        TransactionId subxacts[1];      / VARIABLE LENGTH ARRAY /
} xl_xact_commit_compact;

从以上逻辑看到, recoveryTargetInclusive只有当恢复目标是xid或者time时可以指定.

目标是target name时不需要指定.

参考

1. src/include/catalog/pg_control.h

/ XLOG info values for XLOG rmgr /
#define XLOG_CHECKPOINT_SHUTDOWN                0x00
#define XLOG_CHECKPOINT_ONLINE                  0x10
#define XLOG_NOOP                                               0x20
#define XLOG_NEXTOID                                    0x30
#define XLOG_SWITCH                                             0x40
#define XLOG_BACKUP_END                                 0x50
#define XLOG_PARAMETER_CHANGE                   0x60
#define XLOG_RESTORE_POINT                              0x70
#define XLOG_FPW_CHANGE                         0x80

2. src/include/access/xlog.h

/*
 * XLOG uses only low 4 bits of xl_info.  High 4 bits may be used by rmgr.
 */
#define XLR_INFO_MASK                   0x0F

3. src/include/access/rmgr.h

/*
 * Built-in resource managers
 *
 * Note: RM_MAX_ID could be as much as 255 without breaking the XLOG file
 * format, but we keep it small to minimize the size of RmgrTable[].
 */
#define RM_XLOG_ID                              0
#define RM_XACT_ID                              1
#define RM_SMGR_ID                              2
#define RM_CLOG_ID                              3
#define RM_DBASE_ID                             4
#define RM_TBLSPC_ID                    5
#define RM_MULTIXACT_ID                 6
#define RM_RELMAP_ID                    7
#define RM_STANDBY_ID                   8
#define RM_HEAP2_ID                             9
#define RM_HEAP_ID                              10
#define RM_BTREE_ID                             11
#define RM_HASH_ID                              12
#define RM_GIN_ID                               13
#define RM_GIST_ID                              14
#define RM_SEQ_ID                               15
#define RM_SPGIST_ID                    16
时间: 2024-10-01 02:11:42

PostgreSQL 最佳实践 - 任意时间点恢复源码分析的相关文章

PostgreSQL reload配置的动作反馈与源码分析

PostgreSQL reload配置的动作反馈与源码分析 作者 digoal 日期 2016-09-01 标签 PostgreSQL , reload , 配置 背景 PostgreSQL数据库的配置文件中,有一些配置项是支持reload的,但是如果配置写错了,reload时怎么知道呢? 源码分析 reload其实是通过给postmaster进程发SIGHUP信号来实现的. 通过pg_ctl或者kill或者pg_reload_conf()函数都可以发信号. postmaster收到这个信号之后

PostgreSQL 最佳实践 - pg_rman 以standby为源的备份浅析

背景 为了降低备份对数据库的性能影响,我们在有standby的情况下,可以选择从standby备份PostgreSQL. pg_rman是一个备份工具,同样支持从standby备份数据库,但是有一些使用的注意事项. 例如需要连接主库,需要能读取归档目录,需要告诉主库和备库的连接地址,需要备库的$PGDATA等等. 为什么既要连接主库,还要能力连接备库. pg_rman需要连接到主库执行pg_start_backup,pg_stop_backup. 同时需要连接到standby数据库查看pg_la

时间、空间、对象 海量极速多维检索 - 阿里云RDS PostgreSQL最佳实践

标签 PostgreSQL , 时间 , 空间 , 对象属性 , 多维度检索 , 海量 , 空间索引 , 数据分区 , 块级索引BRIN , 多级索引 , GIN倒排索引 , JSON索引 , 多列索引 , 多索引扫描合并 , bitmapAnd , bitmapOr , 物理扫描 , ctid扫描 , intersect , partial index , partition index 背景 人类或者其他对象的活动产生了海量的时间.空间数据,如果有科技能实现回到过去,过去的世界状态会是什么样

海量实时计算+OLTP+OLAP DB设计 - 阿里云(RDS、HybridDB) for PostgreSQL最佳实践 - 泛电网系统应用

标签 PostgreSQL , 国家电网 , 电表 , 余额 , 流式计算 , 状态监测 , 上下文相关 背景 电网系统是一个关系民生,又非常典型的传统系统,虽然传统,量可不小.在互联网化(物联网化)的今天,有很多值得借鉴和思考的点供给其他相关系统参考. 每个省份大概有亿级户电表,最大的地市可能有千万户级别. 以往我们电费是怎么交的呢?我们小区是两个月交一次,也就是说先消费,再付款的方式.这么说起来电网真的是很仁义啊,现在哪有这么多先消费再付款的呀.移动话费.家庭宽带.天然气等等,都是充值后使用

PostgreSQL 助力企业打开时空之门 - 阿里云(RDS、HybridDB) for PostgreSQL最佳实践

标签 PostgreSQL , Greenplum , 时间 , 空间 , 对象 , 多维透视 , 多维分析 背景 时空数据无处不在,未来空间数据的占比会越来越高,在TP与AP场景的需求也会越来越旺盛. 选址.网格运营 空间数据自动聚集分析:时间+多边形圈人:驻留时间分析:舆情分析:... 室内定位 3D坐标:相对坐标系:+以上:运营活动效果分析报表: 科研 太空探索.测绘.气象.地震预测.溯源 无人驾驶 点云:动态路径规划: 空间调度(菜鸟.饿了么.滴滴.高德.快递...) 实时位置更新:多边

(新零售)商户网格化运营 - 阿里云RDS PostgreSQL最佳实践

标签 PostgreSQL , PostGIS , 地理位置 , KNN , 近邻检索 , 网格检索 , polygon中心点 , 半径搜索 背景 伟大的马老师说: "纯电商时代很快会结束,未来的十年.二十年,没有电子商务这一说,只有新零售这一说,也就是说线上线下和物流必须结合在一起,才能诞生真正的新零售" 线上是指云平台,线下是指销售门店或生产商,新物流消灭库存,减少囤货量. 电子商务平台消失是指,现有的电商平台分散,每个人都有自己的电商平台,不再入驻天猫.京东.亚马逊大型电子商务平

音视图(泛内容)网站透视分析 DB设计 - 阿里云(RDS、HybridDB) for PostgreSQL最佳实践

标签 PostgreSQL , 用户透视 , 设备透视 , 圈人 , 标签 , 视频网站 , 优酷 , 土豆 , 喜马拉雅 背景 日常生活中,人们使用最多的除了社交类网站.购物网站,估计就是音频.视频.图文信息类内容网站了. 视频网站,已经渗透到各种终端,除了喜闻乐见的手机,还包括移动终端.电脑.盒子.电视.投影仪等.有设备属性.会员属性.渠道属性等. 内容运营是非常重要的环节,而透视则是运营的重要武器. 业务需求 1.生成设备.会员画像 ID.各个维度的标签.其中包括一些多值列标签(例如最近7

贷款、天使投资(风控助手)业务数据库设计 - 阿里云RDS PostgreSQL, HybridDB for PostgreSQL最佳实践

标签 PostgreSQL , HybridDB for PostgreSQL , 小微贷款 , 金融风控 , 企业图谱 , 图式搜索 , 舆情分析 , 自动贷款 , 贷款审查 , 审查神器 背景 贷款是银行的主营业务之一,但是并不是只有银行能提供贷款,实际上资金雄厚的公司都有能力提供贷款(比如保险行业.资源垄断型企业等). 除了放贷,我们常说的天使投资.A轮B轮啥的,也是类似的场景,凭什么投你,背后如何决策也需要决策系统的支撑. 与贷款相反的是吸金类业务,比如我们现在发现越来越多的理财产品.股

Greenplum 空间(GIS)数据检索 B-Tree & GiST 索引实践 - 阿里云HybridDB for PostgreSQL最佳实践

标签 PostgreSQL , GIS , PostGIS , Greenplum , 空间检索 , GiST , B-Tree , geohash 背景 气象数据.地震数据.室内定位.室外定位.手机.车联网.还有我们最喜欢的"左划不喜欢.右划喜欢",越来越多的位置属性的数据.将来会越来越多. 基于GIS的数据分析.OLTP业务也越来越受到决策者的青睐,例如商场的选址决策,O2O的广告营销等.有很多基于多边形.时间.用户对象属性过滤的需求. 阿里云HybridDB for Postgr