Linux

Make a Linux cluster! – Beginning

05/12/2021

Meanwhile I was writing applications for me, I always thinking how I could make my environment more bulletproof and stable. Fact, that I was using single systems, was always a single point of failure. Until now! At least on operating system level, I am beyond this obstacle.

This article is part of a series. Full series:
Make Linux cluster! – Beginning
Make Linux cluster! – Configure resources
Make Linux cluster! – Work and test resources
Make Linux cluster! – Pitfalls and observations

How it begins

My first stage was the Google search, because it is 100% not I am the only one who was thinking for this. First finding led Debian official page (as I am a Debian server type). I was happy about it, but as I were reading I have seen that it is outdated for Debian and pacemaker/corosync versions. But at least, it was good as a base and I trusted in my skills to found out what changed and how during implementation.

I use corosync cluster engine with pacemaker pacemaker cluster resource manager. My plan was to establish a stable Linux cluster and define my DNS server with a floating IP. Before, my bind9 DNS server was running only my Raspberry Pi 3 machine. My first goal with established cluster is to improve it: it run on my main server, but when it is down, it moves to Raspberry Pi as backup location.

In the following I will tell my journey about implementation of this cluster.

What is a Linux cluster?

Cluster, generally speaking, when more system are working together: they are sharing resources and provides single control point more system. This is existing thing, of course, for Linux servers.

This cluster can be established between bare metal or virtual servers, even between geographical locations, does not matter where the system is running until the network reach that node.

Cluster provides high availability (HA) for applications over more system. How? Let’s say we host a web application on a single system. What happens when the system is down? We have no running service obviously. What would happen in a cluster? We can set our resource as cluster resource when one system is down, application is starting on the other system!

We can also go further. If we have more machine, we have more IP address. After a move, IP address of our service is different. But we can also set a virtual IP address which can be moved among the system together with the service. On this way, we can reach the web service with a common IP address does not matter where it is.

Installation and configuration of core

I installed corosync, crmsh (pacemaker) and fence-agents packages from Debian repository on my main server. After installation I stopped corosync and pacemaker by using systemctl stop corosync and systemctl stop pacemaker commands.

I have edited the /etc/corosync/corosync.conf h with the followings:

In totem section:
- transport: knet
- crypto_cipher: aes256
- crypto_hash: sha1
- In interface section I provided bind network address and address and port for mcast
I have not modified logging section, defaults were OK
In quorum section:
- Because I will use 2 node in my cluster, I activated two_node: 1 setting
In node list I put my 2 node infroatmion based on the provided comments

Whole configuration can be seen in the following.

totem {
        version: 2
        transport: knet
        cluster_name: debian
        crypto_cipher: aes256
        crypto_hash: sha1

        interface {
                ringnumber: 0
                bindnetaddr: 192.168.50.0
                mcastaddr: 239.255.1.1
                mcastport: 5405
                ttl: 1
        }
}

logging {
        fileline: off
        to_stderr: yes
        to_logfile: yes
        logfile: /var/log/corosync/corosync.log
        to_syslog: yes
        debug: off
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}

quorum {
        provider: corosync_votequorum
        two_node: 1
}

nodelist {
        node {
                name: atihome
                nodeid: 1
                ring0_addr: 192.168.50.201
        }
        node {
                name: pihome
                nodeid: 2
                ring0_addr: 192.168.50.202
        }
}

Interesting part is the quorum part. What is quorum? Cluster will start and handle resource only when the quorum is fine. When it will get? When more than half of the system plus 1 is active in the cluster. For example, I would have 8 node in the cluster. Cluster resources will start only when at least 5 system active in the cluster.

Why there is this rule? Imagine a scenario with 8 node cluster above, that 2 node are rejected from cluster for some reason. If there would not be this rule, then that 2 node could establish a separated cluster and managing network resource (e.g.: they would also activate floating IP address) which could lead for some issue. It is called “split brain scenario”.

But, when there is only 2 node in the cluster, it becomes a bit more interesting. Because for quorum, we would need 2/2+1=2 system (50% of 2 node is 1, plus 1 is 2). It means, that any cluster resource would start only when both node is up. One node could not operate, so not too much meaning to have cluster with 2 nodes. But in the quorum section I enabled two_node: 1 option. What it makes it to set quorum artificially to 1, so 2 node setup can also work.

Last thing is to make a authorization key for corosync. For this I used corosync-keygen command. I created key in the /etc/corosync folder (command writes to shell where it is made).

At this point, I distributed my config and key to my Raspberry Pi by using rsync utility.

Start and see what happens

I have started services on both system by using systemctl start corosync and systemctl start pacemaker commands on both nodes. Both service started properly. I have checked cluster status by using sudo crm status command. Output of this command would similar:

Cluster Summary:
  * Stack: corosync
  * Current DC: atihome (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Sun Dec  5 16:22:50 2021
  * Last change:  Sun Dec  5 16:00:14 2021 by root via cibadmin on atihome
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ atihome pihome ]

For settings I used the crm command line. After sudo crm command a new shell is open where crm commands can be executed. First I set some property:

crm(live/atihome)# configure
crm(live/atihome)configure# property stonith-enabled=false
crm(live/atihome)configure# property no-quorum-policy=ignore
crm(live/atihome)configure# commit
crm(live/atihome)configure# up

STONITH is acronym of “Shot The Other Node In the Head”. Because small 2 node home-lab environment it is disabled, although recommendation and official supported way is to enable it. When a system is gone from the cluster, in default it is considered as graceful shutdown. But in practice it is not always the situation, it can happen that node is not seen by cluster engine but it still alive and running resources/services. For these situation we can use fence applications which can work together with STONITH and kill the system in similar cases.

Other property no-quorum-policy is set to ignore, because it is a simple 2 node cluster. It tells that when there is no quorum of the cluster, then ignore it and work still: manage the resources.

In a bigger cluster, these settings would not make sense, they are there because it is a 2 node cluster. No settings is active in configure sub-menu until it is committed. With up we can go back to root. By typing configure show command we can see the current configuration. It would be seen like this:

crm(live/atihome)# configure show
node 1: atihome \
        attributes maintenance=off
node 2: pihome \
        attributes maintenance=off
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=2.0.5-ba59be7122 \
        cluster-infrastructure=corosync \
        cluster-name=debian \
        stonith-enabled=false \
        no-quorum-policy=ignore

Final words

I have a cluster, which seems fine, next topic is to define something into it!

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Make a Linux cluster! – Beginning

How it begins

What is a Linux cluster?

Installation and configuration of core

Start and see what happens

Final words

You may also like...

Execute cross system commands on Linux

How I ended up with my current network config?

Home Ticketing #4 – Handle data – EF setup