MongoDB replica set (V3.0) synchronizes member status information through heartbeat information. Each node periodically sends heartbeat information, such as the replica set status information shown in the rs.status() method, to other members in the replica set.
The node initiating the heartbeat request is called the source, and the member receiving the heartbeat request is the target. A heartbeat request is divided into three phases.
- The source sends a heartbeat request to the target.
- The target handles the heartbeat request and sends a response to the source.
- The source receives a heartbeat response and updates the status of the target node.
Let us examine the main state synchronization logic in these three phases.
Phase 1
In the default configuration, nodes of a replica set send a heartbeat request to the other members once every two seconds, namely the replSetHeartbeat command request. The content of the heartbeat request is similar to the one shown below (obtained through mongosniff packet capturing). It mainly contains the replSetName, the address of the heartbeat sending node, and the replica set version.
command: replSetHeartbeat database: admin metadata: { $replData: 1 } commandArgs: { replSetHeartbeat: "mongo-9552", pv: 1, v: 22, from: "10.101.72.137:9552", fromId: 3, checkEmpty: false }
Phase 2
When a member in the replica set receives a heartbeat request, it begins processing the request and returns the processing result to the requesting node.
If the node is not of the replica set mode, or the replica set name does not match, an error response will be returned. If the replica set version configured (the content of rs.conf()) for the source node is lower than that of the target node, the target node adds its own configuration to the heartbeat response message, and adds its own oplog and other status information to the heartbeat response message. If the target node is uninitialized, it immediately sends the heartbeat request to the source node to update its replica set configuration.
commandReply: { ok: 1.0, time: 1460705698, electionTime: new Date(6273289095791771649), e: true, rs: true, state: 1, v: 22, hbmsg: "", set: "mongo-9552", opTime: new Date(6272251740930703361) } metadata: { $replData: { term: -1, lastOpCommitted: { ts: Timestamp 1460372410000|1, t: -1 }, lastOpVisible: { ts: Timestamp 0|0, t: -1 }, configVersion: 22, primaryIndex: 2, syncSourceIndex: -1 } }
Phase 3
Phase 3 is the most important part of processing. After receiving the heartbeat response, the source node will update the status of the peer node according to the response message and determine whether a re-election is required based on the final status.
When an error response to the heartbeat request is received (a response timeout is also considered an error response), if the current number of retries is fewer than or equal to kMaxHeartbeatRetries (two by default), and the last heartbeat request was sent within kDefaultHeartbeatTimeoutPeriod (10 by default), the next heartbeat request will be sent immediately. When the number of retries exceeds kMaxHeartbeatRetries, or a period of kDefaultHeartbeatTimeoutPeriod has elapsed since the last heartbeat, the node is considered down. If the replica set version of the peer node is higher than that of the node itself, the configuration of the node will be updated and stored persistently in the local database, and the node will update the peer status information based on the response message. If the node itself is the master node, and it finds another node with a higher priority level has been elected as the master node, it takes the initiative to downgrade itself to a slave node. If the node itself is a slave node but finds that it has a higher priority level and is eligible to be elected as a master node, it will take the initiative to request the current master node to downgrade. (This logic still contains some bugs, so self-downgrading by the master node will take priority so as to ensure that the node with the highest priority can act as the master node). If there is no master node at the moment, the node will take the initiative to trigger an election. A new master node can then be elected after a majority of nodes agree with the election result.
Conclusion
MongoDB synchronizes information between nodes through heartbeats and triggers the election to achieve final consistency in the replica set.
However, there is no theoretical basis for the correctness of the process. In MongoDB 3.2, a new version of the replica set communication protocol is used and election is conducted through raft, which can further shorten the time for fault discovery and instance restoration.