我们在设计数据库HA的时候, 为了防止主备同时读写一份数据, 或者主备同时对外提供服务. 备机在切换成主机前, 需要有一个fence操作, 需先将主节点fence掉, 然后再激活备库为主库.
例如我们这边用的一个HA脚本 :
https://raw.githubusercontent.com/digoal/sky_postgresql_cluster/master/INSTALL.txt
或者RHEL的HA套件, 都有类似的FENCE操作.
但是keepalived目前没有切换前的自定义脚本, 只有切换后的自定义脚本(notify), 并且不等待脚本执行.
如图 :
keepalived如果用于数据库的HA切换还需要完善一下.
本文将讲解一下, 如何修改keepalived代码, 来实现这方面的功能, 即备切换成主之前, 必须先等待fence主节点.
最终目的如图 :
首先要了解一下keepalived notify的机制 :
顺序是这样的 : 起vip, 起静态路由, fork进程调用notify脚本(并且不等待).
注意调用notify脚本在起VIP之后.
我们的目的是把它放到最前面, 并且需要等待notify脚本调用完成再起VIP.
keepalived有几项配置和notify有关, 即当路由器的角色转换后, 调用用户配置的脚本.
doc/keepalived.conf.SYNOPSIS
notify_master <STRING>|<QUOTED-STRING> # Script to run during MASTER transit
notify_backup <STRING>|<QUOTED-STRING> # Script to run during BACKUP transit
notify_fault <STRING>|<QUOTED-STRING> # Script to run during FAULT transit
notify <STRING>|<QUOTED-STRING> # Script to run during ANY state transit (1)
smtp_alert # Send email notif during state transit
}
(1) The "notify" script is called AFTER the corresponding notify_* script has
been called, and is given exactly 4 arguments (the whole string is interpreted
as a litteral filename so don't add parameters!):
$1 = A string indicating whether it's a "GROUP" or an "INSTANCE"
$2 = The name of said group or instance
$3 = The state it's transitioning to ("MASTER", "BACKUP" or "FAULT")
$4 = The priority value
$1 and $3 are ALWAYS sent in uppercase, and the possible strings sent are the
same ones listed above ("GROUP"/"INSTANCE", "MASTER"/"BACKUP"/"FAULT").
当路由器在进入对应的角色后, 会调用对应的脚本如notify_master, notify_backup, notify_fault.
当以上脚本调用完后, 再调用notify脚本.
相关代码举例 :
进入master角色的代码
keepalived/vrrp/vrrp.c
/* MASTER state processing */
int
vrrp_state_master_tx(vrrp_t * vrrp, const int prio)
{
int ret = 0;
if (!VRRP_VIP_ISSET(vrrp)) {
log_message(LOG_INFO, "VRRP_Instance(%s) Entering MASTER STATE"
, vrrp->iname);
vrrp_state_become_master(vrrp); // 进入master角色的操作对应的函数
ret = 1;
} else if (vrrp->garp_refresh && timer_cmp(time_now, vrrp->garp_refresh_timer) > 0) {
vrrp_send_link_update(vrrp);
vrrp->garp_refresh_timer = timer_add_long(time_now, vrrp->garp_refresh);
}
vrrp_send_adv(vrrp,
(prio == VRRP_PRIO_OWNER) ? VRRP_PRIO_OWNER :
vrrp->effective_priority);
return ret;
}
当进入master角色后, 在启用虚拟IP后, 会调用notify脚本.
/* becoming master */
void
vrrp_state_become_master(vrrp_t * vrrp)
{
/* add the ip addresses */ // 先启用虚拟IP
if (!LIST_ISEMPTY(vrrp->vip))
vrrp_handle_ipaddress(vrrp, IPADDRESS_ADD, VRRP_VIP_TYPE);
if (!LIST_ISEMPTY(vrrp->evip))
vrrp_handle_ipaddress(vrrp, IPADDRESS_ADD, VRRP_EVIP_TYPE);
vrrp->vipset = 1;
/* add virtual routes */ // 然后添加静态路由
if (!LIST_ISEMPTY(vrrp->vroutes))
vrrp_handle_iproutes(vrrp, IPROUTE_ADD);
/* remotes neighbour update */ // 然后发送免费ARP报文
vrrp_send_link_update(vrrp);
/* set refresh timer */
if (vrrp->garp_refresh) {
vrrp->garp_refresh_timer = timer_add_long(time_now, vrrp->garp_refresh);
}
/* Check if notify is needed */ // 最后是调用自定义的notify脚本
notify_instance_exec(vrrp, VRRP_STATE_MAST);
#ifdef _WITH_SNMP_ // 触发snmp
vrrp_snmp_instance_trap(vrrp);
#endif
....
这里要注意的是, notify脚本调用时fork进程通过system函数来调用的, 并且主进程直接返回0, 不等待子进程执行完毕.
对应的代码 :
lib/notify.c
/* perform a system call */
int
system_call(char *cmdline)
{
int retval;
retval = system(cmdline);
if (retval == 127) {
/* couldn't exec command */
log_message(LOG_ALERT, "Couldn't exec command: %s", cmdline);
} else if (retval == -1) {
/* other error */
log_message(LOG_ALERT, "Error exec-ing command: %s", cmdline);
}
return retval;
}
/* Execute external script/program */
int
notify_exec(char *cmd)
{
pid_t pid;
int ret;
pid = fork();
/* In case of fork is error. */
if (pid < 0) {
log_message(LOG_INFO, "Failed fork process");
return -1;
}
/* In case of this is parent process */ // 主进程直接退出, 没有wait 子进程.
if (pid)
return 0;
signal_handler_destroy();
closeall(0);
open("/dev/null", O_RDWR);
ret = dup(0);
if (ret < 0) {
log_message(LOG_INFO, "dup(0) error");
}
ret = dup(0);
if (ret < 0) {
log_message(LOG_INFO, "dup(0) error");
}
system_call(cmd);
exit(0);
}
所以, 为了达到目的, 我们需要改2个地方. 为了方便观察, 加几个日志输出.
# cd /opt/soft_bak/keepalived-1.2.13
1. 把调用notify脚本移到vip前面
# vi keepalived/vrrp/vrrp.c
修改函数, 把调用notify脚本从这个函数剥离出来
/* becoming master */
void
vrrp_state_become_master(vrrp_t * vrrp)
{
...
/* Check if notify is needed */
//notify_instance_exec(vrrp, VRRP_STATE_MAST); //注释这里
修改函数
/* MASTER state processing */
int
vrrp_state_master_tx(vrrp_t * vrrp, const int prio)
{
...
// if (!VRRP_VIP_ISSET(vrrp)) { // 改成如下, 即把notify脚本放到vrrp_state_become_master之前, 并且返回成功才会调用vrrp_state_become_master.
if ((!VRRP_VIP_ISSET(vrrp)) && notify_instance_exec(vrrp, VRRP_STATE_MAST)) {
log_message(LOG_INFO, "VRRP_Instance(%s) Entering MASTER STATE"
, vrrp->iname);
vrrp_state_become_master(vrrp);
ret = 1;
2. 需要让主进程等待notify脚本执行完
# vi lib/notify.c
添加等待需要的头文件
#include <sys/types.h>
#include <sys/wait.h>
修改函数
/* Execute external script/program */
int
notify_exec(char *cmd)
{
pid_t pid;
int ret;
pid_t wait_pid;
int wait_ret;
pid = fork();
/* In case of fork is error. */
if (pid < 0) {
log_message(LOG_INFO, "Failed fork process");
return -1;
}
/* child process */ // 子进程执行
if (pid == 0) {
signal_handler_destroy();
closeall(0);
open("/dev/null", O_RDWR);
ret = dup(0);
if (ret < 0) {
log_message(LOG_INFO, "dup(0) error");
}
ret = dup(0);
if (ret < 0) {
log_message(LOG_INFO, "dup(0) error");
}
log_message(LOG_INFO, "notify executing."); // 添加日志输出
system_call(cmd);
exit(0);
}
/* In case of this is parent process */ // 主进程等待
else {
log_message(LOG_INFO, "waiting notify execute success."); // 添加日志输出
wait_pid = waitpid(pid, &wait_ret, 0);
if (wait_pid == pid && WIFEXITED(wait_ret)) {
log_message(LOG_INFO, "notify execute successed."); // 添加日志输出
return 0;
}
return -1;
}
}
重新编译
# gmake && gmake install
测试
配置文件
[root@192_168_173_203 keepalived-1.2.13]# cd /opt/keepalived/etc/keepalived/
[root@192_168_173_203 keepalived]# cat keepalived.conf
! Configuration File for keepalived !号或#号开头为注释
global_defs { # 全局配置除了router_id, 其他都无所谓, 因为我这里的测试只是观察script weight对vrrp的影响.
notification_email {
acassen@firewall.loc
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id DIGOAL_TEST1 # 随便配置, 因为这个不在VRRP包里面体现, 在VRRP包里体现的是虚拟路由器ID.
}
vrrp_script pos { # 脚本配置
script "/root/pos.sh"
interval 1
weight 0 # 配置默认weight 0, 后面可以在instance中覆盖
fall 1 # 从OK到KO需要1次检测失败.
rise 1 # 从KO到OK需要1次检测成功.
}
vrrp_script nag { # 脚本配置
script "/root/nag.sh"
interval 1
weight 0 # 配置默认weight 0, 后面可以在instance中覆盖
fall 1 # 从OK到KO需要1次检测失败.
rise 1 # 从KO到OK需要1次检测成功.
}
vrrp_script zero { # 脚本配置
script "/root/zero.sh"
interval 1
weight 0 # 配置默认weight 0, 后面可以在instance中覆盖
fall 1 # 从OK到KO需要1次检测失败.
rise 1 # 从KO到OK需要1次检测成功.
}
vrrp_instance vi_1 {
state MASTER # 初始状态
interface eth0
virtual_router_id 51
priority 100 # 初始优先级, 注意两个节点配置要不一样. 一高一低, 高的选举为master.
advert_int 1
authentication {
auth_type PASS
auth_pass 12345678 # 最大长度为8, 如果要修改最大长度,
# 可参考 http://blog.163.com/digoal@126/blog/static/163877040201472123810816/
}
track_script {
pos weight 3
zero weight 0
nag weight -5
}
unicast_peer { # 使用单播代替组播发送vrrp心跳.
192.168.173.203
192.168.173.204
}
virtual_ipaddress { # 虚拟地址配置, 和ifconfig兼容
172.16.173.100/24 brd 172.16.173.255 dev eth0 scope link label eth0:1
}
debug # 打开debug, 方便调试, 本例没有用到
nopreempt # 不主动竞争master, 这样的话优先级只用于决定第一次搭建HA时的主库节点, 发生failover后, 不会主动提升.
notify_master "/root/notify.sh"
}
配置脚本
[root@192_168_173_203 keepalived]# vi /root/notify.sh
#!/bin/bash
sleep 1000
exit 0
# chmod 555 /root/notify.sh
启动keepalived
# keepalived -f /opt/keepalived/etc/keepalived/keepalived.conf -D
观察日志 :
# tail -f -n 1 /var/log/messages
Aug 26 10:13:31 192_168_173_203 Keepalived[14531]: Starting Keepalived v1.2.13 (08/19,2014)
Aug 26 10:13:31 192_168_173_203 Keepalived[14532]: Starting Healthcheck child process, pid=14533
Aug 26 10:13:31 192_168_173_203 Keepalived[14532]: Starting VRRP child process, pid=14534
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Netlink reflector reports IP 192.168.173.203 added
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Netlink reflector reports IP 192.168.173.156 added
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Netlink reflector reports IP fe80::f6ce:46ff:fe86:4234 added
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Netlink reflector reports IP 192.168.173.203 added
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Netlink reflector reports IP 192.168.173.156 added
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Registering Kernel netlink reflector
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Registering Kernel netlink command channel
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Netlink reflector reports IP fe80::f6ce:46ff:fe86:4234 added
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Registering gratuitous ARP shared channel
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Registering Kernel netlink reflector
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Registering Kernel netlink command channel
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Opening file '/opt/keepalived/etc/keepalived/keepalived.conf'.
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Opening file '/opt/keepalived/etc/keepalived/keepalived.conf'.
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Configuration is using : 7972 Bytes
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Configuration is using : 69370 Bytes
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Using LinkWatch kernel netlink reflector...
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(10,11)]
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Using LinkWatch kernel netlink reflector...
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Script(nag) succeeded
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Script(zero) succeeded
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Script(pos) succeeded
Aug 26 10:13:32 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Instance(vi_1) Transition to MASTER STATE
Aug 26 10:13:33 192_168_173_203 Keepalived_vrrp[14534]: waiting notify execute success.
Aug 26 10:13:33 192_168_173_203 Keepalived_vrrp[14564]: notify executing.
... 1000秒后, 进入master, 并且启动VIP, 测试成功.
Aug 26 10:30:13 192_168_173_203 Keepalived_vrrp[14534]: notify execute successed.
Aug 26 10:30:13 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Instance(vi_1) Entering MASTER STATE
Aug 26 10:30:13 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Instance(vi_1) setting protocol VIPs.
Aug 26 10:30:13 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Instance(vi_1) Sending gratuitous ARPs on eth0 for 172.16.173.100
Aug 26 10:30:13 192_168_173_203 Keepalived_healthcheckers[14533]: Netlink reflector reports IP 172.16.173.100 added
Aug 26 10:30:18 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Instance(vi_1) Sending gratuitous ARPs on eth0 for 172.16.173.100
我们只要把fence操作放在这个脚本里面, 就可以实现先fence, 再启VIP的过程.
[注意]
本文仅提供思路, 仅供测试, 用于生产请自行承担风险.
[参考]
1. lib/notify.c
2. keepalived/vrrp/vrrp.c
3. man 2 wait
4. man 3 system
5. doc/keepalived.conf.SYNOPSIS
6. https://raw.githubusercontent.com/digoal/sky_postgresql_cluster/master/INSTALL.txt
7. http://blog.163.com/digoal@126/blog/static/163877040201472123810816/