PostgreSQL V8.4 Warm-Standby design and implement

虽然现在PostgreSQL 9.0已经很成熟了,但是很多企业还是在用PostgreSQL 8.4的版本。

那么8.4怎么设计和实施一个数据库standby以及备份策略呢,下面简单的拿一个案例来分享一下 : 


 

以上分为三台主机,实际使用中,备份机和Standby库的主机可以是同一台机器。

以下是配置部分 : 

主节点 10.0.0.10

远程NFS归档目录(postgres用户读写权限) /data/recoverydir/pg_arch

本地归档目录(postgres用户读写权限) /data1/archivedir/pg_arch

备份目录(postgres用户读写权限) /data1/pg_backup

日志目录(postgres用户读写权限) /data1/pg_run_log ; ln -s /data1/pg_run_log /var/log/pg_run_log

  df

/dev/sdb1 670G 48G 589G 8% /data1

10.0.0.20:/data1/archivedir 670G 66G 571G 11% /data/recoverydir

  /etc/fstab

10.0.0.20:/data1/archivedir /data/recoverydir nfs  rw,rsize=8192,wsize=8192,noatime       1 3

  /etc/exports

/data1/archivedir 10.0.0.20/32(rw,sync,wdelay,no_root_squash,anonuid=0,anongid=0)

exportfs -av

固定nfs监听端口,如修改 /etc/services

# Local services

mountd 845/tcp #rpc.mountd

mountd 842/udp #rpc.mountd

rquotad 790/tcp #rpc.rquotad

rquotad 787/udp #rpc.rquotad

备节点 10.0.0.20

远程NFS归档目录(postgres用户读写权限) /data/recoverydir/pg_arch

本地归档目录(postgres用户读写权限) /data1/archivedir/pg_arch

备份目录(postgres用户读写权限) /data1/pg_backup

日志目录(postgres用户读写权限) /data1/pg_run_log ; ln -s /data1/pg_run_log /var/log/pg_run_log

  df

/dev/sdb1 670G 48G 589G 8% /data1

10.0.0.10:/data1/archivedir 670G 66G 571G 11% /data/recoverydir

  /etc/fstab

10.0.0.10:/data1/archivedir /data/recoverydir nfs  rw,rsize=8192,wsize=8192,noatime       1 3

  /etc/exports

/data1/archivedir 10.0.0.10/32(rw,sync,wdelay,no_root_squash,anonuid=0,anongid=0)

固定nfs监听端口,如修改 /etc/services

# Local services

mountd 845/tcp #rpc.mountd

mountd 842/udp #rpc.mountd

rquotad 790/tcp #rpc.rquotad

rquotad 787/udp #rpc.rquotad

# On 备库节点,配置rsync

# Rsyncd On Standby Database Server & Backup Server

cat /etc/rsyncd.postgres.conf 

# Rsyncd On Standby Database Server & Backup Server

port = 873

hosts deny = 0.0.0.0/0

read only = false

write only = false

gid = 0

uid = 0

use chroot = no

max connections = 10

pid file = /var/run/rsync.pid

lock file = /var/run/rsync.lock

log file = /var/log/rsync.log

[pgdata]

path = /data1/pg_data

comment = Building Database Dir.

hosts allow = 10.0.0.20,10.0.0.10

[pgbackup]

path = /data1/pg_backup

comment = Database Backup Dir.

hosts allow = 10.0.0.20,10.0.0.10

# On 备库节点,开启rsync后台进程,如果配置修改的话重新执行一遍以下程序,表示reload

rsync -v --daemon --config=/etc/rsyncd.postgres.conf

# On 主库节点,开启归档

fsync = on

full_page_writes = on

archive_mode = on

archive_command = 'cp -f %p /data1/archivedir/pg_arch/%f 2>>/var/log/pg_run_log/archive_cp_5432.log' 

archive_timeout = 300

# 如果原来archive_mode = off , 修改后需要重启数据库,否则只需要reload配置

# On 备库节点,新建standby, 1.删除已经存在的数据文件目录,(用于重新建立)

rm -rf /data1/pg_data

# On 主库节点,同步数据文件到备库

psql -h 127.0.0.1 postgres postgres -c "select pg_start_backup(now()::text);"

rsync -acvz --exclude=pg_xlog /data1/pg_data/* 10.0.0.20::pgdata

psql -h 127.0.0.1 postgres postgres -c "select pg_stop_backup();"

# On 备库节点,新建standby, 2.配置standby

# recovery.conf

restore_command = 'pg_standby -d -s 2 -t /tmp/pgsql.trigger.5432 /data/recoverydir/pg_arch %f %p 2>>/var/log/pg_run_log/restore_standby_5432.log'

recovery_target_timeline = 'latest'

recovery_end_command = 'rm -f /tmp/pgsql.trigger.5432'

# 新建目录

mkdir /data1/pg_data/pg_xlog

rm -f /data1/pg_data/recovery.done

chown -R postgres:postgres /data1/pg_data

chmod -R 700 /data1/pg_data

# 启动standby库,进入recovery模式

pg_ctl start -D /data1/pg_data

# 至此,standby就建立好了.

下面来谈谈怎么做备份 : 

# On 主库节点,备份数据库,一周一次,crontab: 1 3 * * 2 /usr/local/postgres/backupsh/backup_database_5432.sh >>/var/log/pg_run_log/backup_database_5432.log 2>&1

vi /usr/local/postgres/backupsh/backup_database_5432.sh

chmod 500 /usr/local/postgres/backupsh/backup_database_5432.sh

#!/bin/bash

DATE=`date +%Y%m%d`

TIME=`date +%F`

echo -e "$TIME : select pg_start_backup();\n"

psql -h 127.0.0.1 postgres postgres -c "select pg_start_backup(now()::text);"

RESULT=$?

echo -e "$TIME : select pg_start_backup(); result:$RESULT\n"

if [ $RESULT -ne 0 ]; then

exit $RESULT

fi

echo -e "$TIME : Backup Database use rsync.\n"

rsync -acvz --exclude=pg_xlog /data1/pg_data/* 10.0.0.20::pgbackup/data_backup_$DATE

RESULT=$?

echo -e "$TIME : Backup Database use rsync. result:$RESULT\n"

echo -e "$TIME : select pg_stop_backup();\n"

psql -h 127.0.0.1 postgres postgres -c "select pg_stop_backup();"

RESULT=$?

echo -e "$TIME : select pg_stop_backup(); result:$RESULT\n"

exit $RESULT

# On 主库节点,备份归档日志,一天一次,crontab: 1 1 * * * /usr/local/postgres/backupsh/backup_archive_5432.sh >>/var/log/pg_run_log/backup_archive_5432.log 2>&1

vi /usr/local/postgres/backupsh/backup_archive_5432.sh

chmod 500 /usr/local/postgres/backupsh/backup_archive_5432.sh

#!/bin/bash

TIME=`date +%F`

echo -e "$TIME : Backup archivelog use rsync.\n"

rsync -acvz /data1/archivedir/pg_arch/* 10.0.0.20::pgbackup/arch_backup

RESULT=$?

echo -e "$TIME : Backup archivelog use rsync. result:$RESULT\n"

exit $RESULT

# 至此,备份就做好了

下面来谈谈怎么做一个策略,定期清理历史备份.假设清理15天前的备份.

# On 备库节点,清理超过保留时间窗口的数据,一天一次,crontab: 1 6 * * * /usr/local/postgres/backupsh/purge_backup.sh >>/var/log/pg_run_log/purge_backup_5432.log 2>&1

vi /usr/local/postgres/backupsh/purge_backup_5432.sh

chmod 500 /usr/local/postgres/backupsh/purge_backup_5432.sh

#!/bin/bash

TIME=`date +%F`

echo -e "$TIME : Delete Archivelog From Primary Host Dir.\n"

find /data/recoverydir/pg_arch/* -mtime +15 -exec rm -rf {} \;

echo -e "$TIME : Delete Archivelog From Backup Host Dir.\n"

find /data1/pg_backup/arch_backup/* -mtime +15 -exec rm -rf {} \;

echo -e "$TIME : Delete Datafile From Backup Host Dir.\n"

for i in `ls -1rt /data1/pg_backup/|grep data_backup_|head --lines=-1`

do

rm -rf /data1/pg_backup/$i

done

# 至此,清楚历史备份也部署好了.

下面来谈谈PITR,基于时间点的恢复测试.

# On 备库节点,数据恢复测试,数据恢复

mkdir /data1/pg_backup/data_backup_$DATE/pg_xlog

chown -R postgres:postgres /data1/pg_backup

chmod 700 /data1/pg_backup/data_backup_$DATE

rm -f /data1/pg_backup/data_backup_$DATE/recovery.done

# 编辑配置文件postgresql.conf,修改监听端口与本地已经存在的端口分开.

port = 5433

archive_command = 'cp -f %p /data1/archivedir/pg_arch/%f 2>>/var/log/pg_run_log/archive_cp_5433.log'

# 编辑配置文件,recovery.conf

restore_command = 'pg_standby -d -s 2 -t /tmp/pgsql.trigger.5433 /data/recoverydir/pg_arch %f %p 2>>/var/log/pg_run_log/restore_standby_5433.log'

# recovery_target_timeline = 'latest'

# 以下时间来自数据库select now()这种的输出,然后推算;

recovery_target_time = '2011-07-20 11:13:15.64642+02'

recovery_target_inclusive = 'true'

recovery_end_command = 'rm -f /tmp/pgsql.trigger.5433'

# 启动备库开始恢复

pg_ctl start -D /data1/pg_backup/data_backup_$DATE

# 触发激活(三选一,空文件默认smart,fast表示立刻激活,不找下一个WAL文件.smart表示自动找下一个wal文件,找到就APPLY,直到没有下一个WAL了就激活)

touch /tmp/pgsql.trigger.5433

echo "fast" > /tmp/pgsql.trigger.5433

echo "smart" > /tmp/pgsql.trigger.5433

# On 备库节点,数据恢复测试,reindex HASH索引

# 至此,PITR的测试结束,

下面谈谈应该监控哪些日志.

# 监控日志文件

# 主库节点

/var/log/pg_run_log/archive_cp_5432.log

/var/log/pg_run_log/backup_database_5432.log

/var/log/pg_run_log/backup_archive_5432.log

# 备节点

/var/log/pg_run_log/restore_standby_5432.log

/var/log/rsync.log

/var/log/pg_run_log/purge_backup_5432.log

【note】

由于备份对系统的IO有一定影响,建议在执行备份脚本的时候或者脚本里面的rsync前面加上nice -n 19。这样对系统影响最小.

Nicenesses range from -20 (most favorable scheduling) to 19 (least favorable).

【补充】

其他测试点:

数据文件备份crontab脚本测试

归档文件备份crontab脚本测试

清理超过保留时间窗口的数据crontab脚本测试

时间点恢复测试,激活时smart和fast对比,是否达到一致效果

standby激活测试

存储空间测算

监控日志文件

时间线文件删掉之后是否可以做恢复,是否可以做standby等

【参考】

http://www.postgresql.org/docs/8.4/static/pgstandby.html

pg_standby的激活模式 : 

Smart Failover

In smart failover, the server is brought up after applying all WAL files available in the archive. This results in zero data loss, even if the standby server has fallen behind, but if there is a lot of unapplied WAL it can be a long time before the standby server becomes ready. To trigger a smart failover, create a trigger file containing the word smart, or just create it and leave it empty.

Fast Failover

In fast failover, the server is brought up immediately. Any WAL files in the archive that have not yet been applied will be ignored, and all transactions in those files are lost. To trigger a fast failover, create a trigger file and write the word fast into it. pg_standby can also be configured to execute a fast failover automatically if no new WAL file appears within a defined interval.

#恢复文件格式 ,recovery.conf example

#recovery_target_timeline = '33'  # number or 'latest'

#recovery_target_inclusive = 'true'             # 'true' or 'false'

#recovery_target_xid = '1100842'

#recovery_target_time = '2004-07-14 22:39:00 EST' # 可用select now();输出格式

#recovery_end_command = ''

#restore_command = 'cp /mnt/server/archivedir/%f %p'

【注意】

http://www.postgresql.org/docs/8.4/static/continuous-archiving.html

24.3.6. Caveats

At this writing, there are several limitations of the continuous archiving technique. These will probably be fixed in future releases:

  • Operations on hash indexes are not presently WAL-logged, so replay will not update these indexes. The recommended workaround is to manually REINDEX each such index after completing a recovery operation.
  • If a CREATE DATABASE command is executed while a base backup is being taken, and then the template database that the CREATE DATABASE copied is modified while the base backup is still in progress, it is possible that recovery will cause those modifications to be propagated into the created database as well. This is of course undesirable. To avoid this risk, it is best not to modify any template databases while taking a base backup.
  • CREATE TABLESPACE commands are WAL-logged with the literal absolute path, and will therefore be replayed as tablespace creations with the same absolute path. This might be undesirable if the log is being replayed on a different machine. It can be dangerous even if the log is being replayed on the same machine, but into a new data directory: the replay will still overwrite the contents of the original tablespace. To avoid potential gotchas of this sort, the best practice is to take a new base backup after creating or dropping tablespaces.

It should also be noted that the default WAL format is fairly bulky since it includes many disk page snapshots. These page snapshots are designed to support crash recovery, since we might need to fix partially-written disk pages. Depending on your system hardware and software, the risk of partial writes might be small enough to ignore, in which case you can significantly reduce the total volume of archived logs by turning off page snapshots using the full_page_writes parameter. (Read the notes and warnings inChapter 28 before you do so.) Turning off page snapshots does not prevent use of the logs for PITR operations. An area for future development is to compress archived WAL data by removing unnecessary page copies even when full_page_writes is on. In the meantime, administrators might wish to reduce the number of page snapshots included in WAL by increasing the checkpoint interval parameters as much as feasible.

时间: 2025-01-16 05:36:42

PostgreSQL V8.4 Warm-Standby design and implement的相关文章

PostgreSQL HOT STANDBY using log shipping

PostgreSQL HOT STANDBY by log shipping 测试:一.准备硬件1. 主节点硬件配置DISK : 146GB*6MEM : 14GBCPU : 2.83GHz*82. standby节点硬件配置DISK : 146GB*4MEM : 8GBCPU : 2.0GHz*8 二.准备环境1. 系统Red Hat Enterprise Linux Server release 5.5 (Tikanga) x642. 时钟同步8 * * * * /usr/sbin/ntpd

PostgreSQL stream repication can implement between FreeBSD and CentOS

环境 :  主节点 CentOS 5.8 x64 PostgreSQL 9.3.2 编译器 : gcc pg_config BINDIR = /opt/pgsql9.3.2/bin DOCDIR = /opt/pgsql9.3.2/share/doc HTMLDIR = /opt/pgsql9.3.2/share/doc INCLUDEDIR = /opt/pgsql9.3.2/include PKGINCLUDEDIR = /opt/pgsql9.3.2/include INCLUDEDIR-

震精 - PostgreSQL 10.0 preview 性能增强 - WARM提升一倍性能

标签 PostgreSQL , 10.0 , WARM , 写放大 , 索引写放大 背景 目前,PostgreSQL的MVCC是多版本来实现的,当更新数据时,产生新的版本.(社区正在着手增加基于回滚段的存储引擎) 由于索引存储的是KEY+CTID(行号),当tuple的新版本与旧版本不在同一个数据块(BLOCK)的时候,索引也要随之变化,当新版本在同一个块里面时,则发生HOT UPDATE,索引的值不需要更新,但是因为产生了一条新的记录,所以也需要插入一条索引item,垃圾回收时,将其回收,因此

PostgreSQL 9.0 流复制介绍

PostgreSQL9提供了一个非常兴奋的功能,hot-standby,功能与ORACLE 11G的ACTIVE STANDBY类似.并且增加了流复制的功能,这个与oracle 的standby redo log功能类似,大大的缩短了备份库与主库的事务间隔. HOT-STANDBY可以提供容灾,恢复的同时可以把数据库打开,提供查询功能.以前的版本恢复的时候是不能打开的. 首先看一张postgreSQL的高可用,负载均衡,复制特征矩阵图 这里有一个很好的特性 Slaves accept read-

PostgreSQL 同步流复制锁瓶颈分析

PostgreSQL 同步流复制锁瓶颈分析 作者 digoal 日期 2016-11-07 标签 PostgreSQL , 同步流复制 , mutex , Linux , latch 背景 PostgreSQL的同步流复制实际上是通过walsender接收到的walreceiver的LSN位点,来唤醒和释放那些需要等待WAL已被备库接收的事务的. 对同步事务来说,用户发起结束事务的请求后,产生的RECORD LSN必须要小于或等于walsender接收到的walreceiver反馈的LSN位点.

PostgreSQL 9.4 Allow time delayed standbys and recovery

PostgreSQL 9.4 支持standby apply xlog有一个时间延迟的配置, 即使XLOG源源不断的从master节点接收过来, standby可以判断一下当前系统时间和xlog rec上的时间戳, 只有xlog rec是比系统时间早过配置的时间的XLOG才允许apply, 否则就暂停apply, 该配置名为min_recovery_apply_delay. 在standby节点的recovery.conf中配置. 例如配置了1个小时, 那么standby节点会接收xlog, 但

PHP编码规范-php coding standard

standard|编码|规范 目录 介绍 标准化的重要性 解释 认同观点 项目的四个阶段 命名规则 合适的命名 缩写词不要全部使用大写字母 类命名 类库命名 方法命名 类属性命名 方法中参数命名 变量命名 引用变量和函数返回引用 全局变量 定义命名 / 全局常量 静态变量 函数命名 php文件扩展名 文档规则 评价注释 Comments Should Tell a Story Document Decisions 使用标头说明 Make Gotchas Explicit Interface an

PHP 编程标准

编程|标准 PHP 编程标准 最后修改日期: 2000-11-16 PHP编程标准是经由Todd Hoff许可,基于<C++ 编程标准>为PHP而重写的, 作者为Fredrik Kristiansen, 使用本标准,如果您想拷贝一份留做自用的话,那是完全免费的,这也是我们制作它的原因.假如您发现了任何的错误又或者是有任何的改进,请您给笔者发一个email,以便笔者将它们合并到最新更新中去. 目录 介绍 标准化的重要性 解释 认同观点 项目的四个阶段 命名规则 合适的命名 缩写词不要全部使用大写

记XX2013届优秀毕业生评选(请重视在公司展现自己,重视业绩参评过程,非技术贴)

本文不是什么技术贴,只是作为一名码农,在公司发展中遇到"参评"中的一个分享,希望对大家有帮助,毕竟,升职加薪这种事情,你需要自己去争取,需要获得领导和同事的认可....考虑到隐私,隐去公司名称和人员称呼 在记录自己的经历的同时,也希望能够引起你的共鸣.业绩参评需要注意的问题.特别注意红色字段 昨天老大叫我准备下,去参评XX优秀毕业生.说今天要答辩,尼玛,这也太突然了吧.没有准备.把自己做过的东西拉了个PPT. 今天给各位大佬汇报,结果被XX总裁给批了,说,XX啊,你得讲清楚我们需要的是