今天在给一组mongoDB replicaSet修改配置的时候,遇到一个BUG。
环境是这样的:
3台服务器组成的3 full nodes replicaSet.
由于其中一台服务器配置比较差,准备调整"priority" : 0
让这台节点不会自动转会为primary.
引起BUG的是开启了keyFile(1.8.1使用keyFile来实现member之间的认证),也就是replicaSet的authorization.(看样子目前replicaSet的authorization还是不稳定啊。)
修改配置如下:
首先在primary修改配置
digoal:PRIMARY> conf=rs.conf()
{
"_id" : "digoal",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "192.168.xxx.xxx:5281"
},
{
"_id" : 1,
"host" : "192.168.xxx.xxx:5281"
},
{
"_id" : 2,
"host" : "192.168.xxx.xxx:5281"
}
]
}
digoal:PRIMARY> conf.members[1].priority=0
0
digoal:PRIMARY> conf
{
"_id" : "digoal",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "192.168.xxx.xxx:5281"
},
{
"_id" : 1,
"host" : "192.168.xxx.xxx:5281",
"priority" : 0
},
{
"_id" : 2,
"host" : "192.168.xxx.xxx:5281"
}
]
}
digoal:PRIMARY> rs.reconfig(conf)
digoal:PRIMARY> db.auth("digoal","digoalpwd")
1
digoal:PRIMARY> rs.conf()
{
"_id" : "digoal",
"version" : 2,
"members" : [
{
"_id" : 0,
"host" : "192.168.xxx.xxx:5281"
},
{
"_id" : 1,
"host" : "192.168.xxx.xxx:5281",
"priority" : 0
},
{
"_id" : 2,
"host" : "192.168.xxx.xxx:5281"
}
]
}
从主节点来看,配置已经生效了。
但是在任何一个SLAVE节点,都没有生效
digoal:SECONDARY> rs.conf()
{
"_id" : "digoal",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "192.168.xxx.xxx:5281"
},
{
"_id" : 1,
"host" : "192.168.xxx.xxx:5281"
},
{
"_id" : 2,
"host" : "192.168.xxx.xxx:5281"
}
]
}
从SLAVE节点的日志来看,
Sat May 21 07:44:48 [rs Manager] replset msgReceivedNewConfig version: version: 3
Sat May 21 07:44:48 [rs Manager] replSet info saving a newer config version to local.system.replset
Sat May 21 07:44:48 [rs Manager] Server::doWork task:rs Manager exception:unauthorized db:local lock type:2 client:(NONE)
Sat May 21 07:44:49 [conn3] run command admin.$cmd { replSetHeartbeat: "digoal", v: 1, pv: 1, checkEmpty: false, from: "192.168.175.71:5281" }
Sat May 21 07:44:49 [conn3] query admin.$cmd ntoreturn:1 command: { replSetHeartbeat: "digoal", v: 1, pv: 1, checkEmpty: false, from: "192.168.175.71:5281" } reslen:132 0ms
Sat May 21 07:44:49 [conn4] run command admin.$cmd { replSetHeartbeat: "digoal", v: 3, pv: 1, checkEmpty: false, from: "192.168.175.67:5281" }
Sat May 21 07:44:49 [conn4] query admin.$cmd ntoreturn:1 command: { replSetHeartbeat: "digoal", v: 3, pv: 1, checkEmpty: false, from: "192.168.175.67:5281" } reslen:132 0ms
从这上面来看,replicaSet的节点之间已经失去认证了。
不修改配置的话不会这样,一切正常。
后来查到,mongodb告知这是一个BUG。修复办法是升级到1.8.2,1.9.0
于是就找来1.8.2的版本,但是问题更蹊跷。
在起1.8.2的时候连库都起不来
mmongo@db-digoal-> mongod -h
Sat May 21 08:07:52 Invalid access at address: 0x1
Sat May 21 08:07:52 Got signal: 11 (Segmentation fault).
Sat May 21 08:07:52 Backtrace:
0x8a66b9 0x8a6c90 0x314600eb10 0x3145879b60 0x56d3d3 0x8a5149 0x8ad2a3 0x314581d994 0x4e1009
mongod(_ZN5mongo10abruptQuitEi+0x399) [0x8a66b9]
mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0x8a6c90]
/lib64/libpthread.so.0 [0x314600eb10]
/lib64/libc.so.6(strlen+0x10) [0x3145879b60]
mongod(_ZN5mongo13show_warningsEv+0x363) [0x56d3d3]
mongod(_Z14show_help_textN5boost15program_options19options_descriptionE+0x9) [0x8a5149]
mongod(main+0x10d3) [0x8ad2a3]
/lib64/libc.so.6(__libc_start_main+0xf4) [0x314581d994]
mongod(__gxx_personality_v0+0x3a1) [0x4e1009]
Sat May 21 08:07:52 dbexit:
Sat May 21 08:07:52 File I/O errno:29 Illegal seek
shutdown: going to close listening sockets...
Sat May 21 08:07:52 shutdown: going to flush diaglog...
Sat May 21 08:07:52 shutdown: going to close sockets...
Sat May 21 08:07:52 shutdown: waiting for fs preallocator...
Sat May 21 08:07:52 shutdown: closing all files...
Sat May 21 08:07:52 closeAllFiles() finished
Sat May 21 08:07:52 dbexit: really exiting now
于是只能放弃升级到1.8.2,暂时将keyFile的参数去掉,用1.8.1来起,配置得以同步到SLAVE节点。恢复正常。
再在服务器的iptables里面配置安全选项。