Wednesday, April 18, 2007

2 node vs 3+ node clusters

A comment on my post about the failure probability of clusters suggested that a six node cluster that has one node fail should become a five node cluster.

The problem with this is what to do when nodes recover from a failure. For example if a six node cluster had a node fail and became a five node cluster, then became a three node cluster after another two nodes had failed, then you would have half the cluster that was disconnected. If the three nodes that appeared to have failed became active again but unable to see the other three nodes then you would have a split-brain situation.

As noted in the comment the special case of a two node cluster does have different failure situations. If the connection between nodes goes down and the router can still be pinged then you can have a split brain situation. To avoid this you will generally have a direct connection between the two nodes (either a null-modem cable or a crossover Ethernet cable), such cables are more reliable than networking which involves a switch or hub. Also the network interface which involves the router in question will ideally also be used as a method of maintaining cluster status - it seems unlikely that two nodes will both be able to ping the router but be unable to send data to each other.

For best reliability you need to use multiple network interfaces between cluster nodes. One way of doing this is to have a pair of Ethernet ports bonded for providing the service (connected to two switches and pinging a router to determine which switch is best to use). The Heartbeat software supports encrypted data so it should be safe to run it on the same interface as used for providing the service (of course if you provide a service to the public Internet then you want a firewall to prevent machines on the net from trying to attack it).

Heartbeat also supports using multiple interfaces for maintaining the cluster data, so you can have one network dedicated to cluster operations and the network that is used for providing the service can be a backup network for cluster data. The pingd service allows Heartbeat to place services on nodes that have good connectivity to the net. So you could have multiple nodes that each have one Ethernet port for providing the service and one port as a backup for Heartbeat operations, if pingd indicates that the service port was not functioning correctly then the services would be moved to other nodes.

If you want to avoid having private Heartbeat data going over the service interface then in the two-node case you need a minimum of two Ethernet ports for Heartbeat and one port for providing the service if you use pingd. If you don't use pingd then you need two bonded ports for providing the service and two ports (either bonded or independently configured in Hertbeat) for Heartbeat giving a total of four ports.

When there are more than two nodes in the cluster the criteria for cluster membership is that a majority of nodes are connected. This makes split-brain impossible and reduces the need to have reliable Ethernet interfaces. A cluster with three or more nodes could have a single service port and a single private port for Heartbeat, or if you trust the service interface you could do it all on one Ethernet port.

In summary, three nodes is better than two, but requires more hardware. Five nodes is better than three, but as I wrote in my previous post four nodes is not much good. I recommend against any even number of nodes other than two for the same reason.

1 comment:

Matthew W. S. Bell said...

You have a six node cluster, and one node fails. The remaining 5 nodes have a quorum and form a 5-cluster. However, the failed node now has communication with no nodes, therefore it knows that, at best, the remaining nodes have formed a 5-cluster. Therefore, it knows it must be in contact with at least 3 of them for the cluster to be quorate, rather than at least 2. Similarly you could have a 2 failed nodes from a 6-cluster, knowing that the remaining nodes have, at best formed a 4-cluster, therefore it must be in contact with at least 3 nodes for it two be quorate.