Meanwhile I was writing applications for me, I always thinking how I could make my environment more bulletproof and stable. Fact, that I was using single systems, was always a single point of failure. Until now! At least on operating system level, I am beyond this obstacle.
This article is part of a series. Full series:
Make Linux cluster! – Beginning
Make Linux cluster! – Configure resources
Make Linux cluster! – Work and test resources
Make Linux cluster! – Pitfalls and observations
How it begins
My first stage was the Google search, because it is 100% not I am the only one who was thinking for this. First finding led Debian official page (as I am a Debian server type). I was happy about it, but as I were reading I have seen that it is outdated for Debian and pacemaker/corosync versions. But at least, it was good as a base and I trusted in my skills to found out what changed and how during implementation.
I use corosync cluster engine with pacemaker pacemaker cluster resource manager. My plan was to establish a stable Linux cluster and define my DNS server with a floating IP. Before, my bind9 DNS server was running only my Raspberry Pi 3 machine. My first goal with established cluster is to improve it: it run on my main server, but when it is down, it moves to Raspberry Pi as backup location.
In the following I will tell my journey about implementation of this cluster.
What is a Linux cluster?
Cluster, generally speaking, when more system are working together: they are sharing resources and provides single control point more system. This is existing thing, of course, for Linux servers.
This cluster can be established between bare metal or virtual servers, even between geographical locations, does not matter where the system is running until the network reach that node.
Cluster provides high availability (HA) for applications over more system. How? Let’s say we host a web application on a single system. What happens when the system is down? We have no running service obviously. What would happen in a cluster? We can set our resource as cluster resource when one system is down, application is starting on the other system!
We can also go further. If we have more machine, we have more IP address. After a move, IP address of our service is different. But we can also set a virtual IP address which can be moved among the system together with the service. On this way, we can reach the web service with a common IP address does not matter where it is.
Installation and configuration of core
I installed corosync, crmsh (pacemaker) and fence-agents packages from Debian repository on my main server. After installation I stopped corosync and pacemaker by using systemctl stop corosync
and systemctl stop pacemaker
commands.
I have edited the /etc/corosync/corosync.conf
h with the followings:
- In totem section:
transport: knet
crypto_cipher: aes256
crypto_hash: sha1
- In interface section I provided bind network address and address and port for mcast
- I have not modified logging section, defaults were OK
- In quorum section:
- Because I will use 2 node in my cluster, I activated
two_node: 1
setting
- Because I will use 2 node in my cluster, I activated
- In node list I put my 2 node infroatmion based on the provided comments
Whole configuration can be seen in the following.
totem { version: 2 transport: knet cluster_name: debian crypto_cipher: aes256 crypto_hash: sha1 interface { ringnumber: 0 bindnetaddr: 192.168.50.0 mcastaddr: 239.255.1.1 mcastport: 5405 ttl: 1 } } logging { fileline: off to_stderr: yes to_logfile: yes logfile: /var/log/corosync/corosync.log to_syslog: yes debug: off logger_subsys { subsys: QUORUM debug: off } } quorum { provider: corosync_votequorum two_node: 1 } nodelist { node { name: atihome nodeid: 1 ring0_addr: 192.168.50.201 } node { name: pihome nodeid: 2 ring0_addr: 192.168.50.202 } }
Interesting part is the quorum part. What is quorum? Cluster will start and handle resource only when the quorum is fine. When it will get? When more than half of the system plus 1 is active in the cluster. For example, I would have 8 node in the cluster. Cluster resources will start only when at least 5 system active in the cluster.
Why there is this rule? Imagine a scenario with 8 node cluster above, that 2 node are rejected from cluster for some reason. If there would not be this rule, then that 2 node could establish a separated cluster and managing network resource (e.g.: they would also activate floating IP address) which could lead for some issue. It is called “split brain scenario”.
But, when there is only 2 node in the cluster, it becomes a bit more interesting. Because for quorum, we would need 2/2+1=2 system (50% of 2 node is 1, plus 1 is 2). It means, that any cluster resource would start only when both node is up. One node could not operate, so not too much meaning to have cluster with 2 nodes. But in the quorum section I enabled two_node: 1
option. What it makes it to set quorum artificially to 1, so 2 node setup can also work.
Last thing is to make a authorization key for corosync. For this I used corosync-keygen
command. I created key in the /etc/corosync
folder (command writes to shell where it is made).
At this point, I distributed my config and key to my Raspberry Pi by using rsync
utility.
Start and see what happens
I have started services on both system by using systemctl start corosync
and systemctl start pacemaker
commands on both nodes. Both service started properly. I have checked cluster status by using sudo crm status
command. Output of this command would similar:
Cluster Summary: * Stack: corosync * Current DC: atihome (version 2.0.5-ba59be7122) - partition with quorum * Last updated: Sun Dec 5 16:22:50 2021 * Last change: Sun Dec 5 16:00:14 2021 by root via cibadmin on atihome * 2 nodes configured * 2 resource instances configured Node List: * Online: [ atihome pihome ]
For settings I used the crm
command line. After sudo crm
command a new shell is open where crm commands can be executed. First I set some property:
crm(live/atihome)# configure crm(live/atihome)configure# property stonith-enabled=false crm(live/atihome)configure# property no-quorum-policy=ignore crm(live/atihome)configure# commit crm(live/atihome)configure# up
STONITH is acronym of “Shot The Other Node In the Head”. Because small 2 node home-lab environment it is disabled, although recommendation and official supported way is to enable it. When a system is gone from the cluster, in default it is considered as graceful shutdown. But in practice it is not always the situation, it can happen that node is not seen by cluster engine but it still alive and running resources/services. For these situation we can use fence applications which can work together with STONITH and kill the system in similar cases.
Other property no-quorum-policy
is set to ignore, because it is a simple 2 node cluster. It tells that when there is no quorum of the cluster, then ignore it and work still: manage the resources.
In a bigger cluster, these settings would not make sense, they are there because it is a 2 node cluster. No settings is active in configure sub-menu until it is committed. With up we can go back to root. By typing configure show
command we can see the current configuration. It would be seen like this:
crm(live/atihome)# configure show node 1: atihome \ attributes maintenance=off node 2: pihome \ attributes maintenance=off property cib-bootstrap-options: \ have-watchdog=false \ dc-version=2.0.5-ba59be7122 \ cluster-infrastructure=corosync \ cluster-name=debian \ stonith-enabled=false \ no-quorum-policy=ignore
Final words
I have a cluster, which seems fine, next topic is to define something into it!