Monday, April 09, 2007

heartbeat - what defines a cluster?

In Debian bug 418210 there is discussion of what constitutes a cluster.

I believe that the node configuration lines in the config file /etc/ha.d/ha.cf should authoritatively define what is in the cluster and any broadcast packets from other nodes should be ignored.

Currently if you have two clusters sharing the same VLAN and they both use the same auth code then they will get confused about which node belongs to each cluster.

I set up a couple of clusters for testing (one Debian/Etch and the other Debian/unstable) under Xen using the same bridge device - naturally I could set up separate bridges - but why should I have to?

I gave each of them the same auth code (one was created by copying the block devices from the other - they have the same root password so there shouldn't be a need for changing any other passwords). Then things all fell apart. They would correctly determine that they should each have two nodes in the cluster (mapping to the two node lines), but cluster 1 would get nodes ha1 and ha2-unstable even though it had node lines for ha1 and ha2.

I have been told that this is the way it's supposed to be and I should just use different ports or different physical media.

I wonder how many companies have multiple Heartbeat installations on different VLANs such that a single mis-connected cable will make all hell break loose on their network...

2 comments:

bob said...

This caught me out the other day. http://tech.randomness.org.uk/entry/multiple_heartbeats

Alan Robertson said...

If you have more than one cluster, don't use bcast. That's simple enough. Mcast works. Ucast works. I wouldn't mess with different port numbers. Back when we only had bcast, then the port number was the only option. But, that's been a long long time.

And, as a security expert, you already know better than to give your clusters all the same authkeys files.

Also, if you don't turn on autojoin, they won't merge.

If you do any of these three, then your clusters won't magically merge when you have them on the same network.

What result you get will depend on exactly which options you've chosen.

If you want the behavior you say you want, then I'd do all three. Simple enough.