Linux

Make Linux cluster! – Work and test resources

05/12/2021

Meanwhile I was writing applications for me, I always thinking how I could make my environment more bulletproof and stable. Fact, that I was using single systems, was always a single point of failure. Until now! At least on operating system level, I am beyond this obstacle.

This article is part of a series. Full series:
Make Linux cluster! – Beginning
Make Linux cluster! – Configure resources
Make Linux cluster! – Work and test resources
Make Linux cluster! – Pitfalls and observations

Manipulate resources

Clustre related/defined resources must be handled from crm shell. It is possible to use simple start/stop/restart commands for them.

crm(live/atihome)# resource stop bind9
crm(live/atihome)# status
Cluster Summary:
  * Stack: corosync
  * Current DC: atihome (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Sun Dec  5 17:48:05 2021
  * Last change:  Sun Dec  5 17:48:04 2021 by root via cibadmin on atihome
  * 2 nodes configured
  * 2 resource instances configured (1 DISABLED)

Node List:
  * Online: [ atihome pihome ]

Full List of Resources:
  * DnsIP       (ocf::heartbeat:IPaddr2):        Stopped
  * bind9       (service:named):         Stopped (disabled)

It can be observed that DnsIP resource also stopped, because of colocation contrain. Constrains can be checked by constrain command:

crm(live/atihome)# resource constrain bind9
    DnsIP                                                                        (score=INFINITY, id=DnsWithIP)
    : Node pihome                                                                (score=25, id=DnsAltLocation)
    : Node atihome                                                               (score=100, id=DnsLocation)
* bind9

It can start back with resource start bind9 in crm shell.

Planned move

Sometimes, it needs to be moved manually not just during a disaster. We can use these commands:

resource move bind9 pihome: Move bind9 and DnsIP to pihome node
resource clear bind9: Clear every constraint about the resource, it will move the highest score location (atihome)

crm(live/atihome)# resource move bind9 pihome
INFO: Move constraint created for bind9 to pihome
crm(live/atihome)# status
Cluster Summary:
  * Stack: corosync
  * Current DC: atihome (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Sun Dec  5 18:43:11 2021
  * Last change:  Sun Dec  5 18:43:08 2021 by root via crm_resource on atihome
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ atihome pihome ]

Full List of Resources:
  * DnsIP       (ocf::heartbeat:IPaddr2):        Started pihome
  * bind9       (service:named):         Started pihome

crm(live/atihome)# resource clear bind9
INFO: Removed migration constraints for bind9
crm(live/atihome)# status
Cluster Summary:
  * Stack: corosync
  * Current DC: atihome (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Sun Dec  5 18:43:21 2021
  * Last change:  Sun Dec  5 18:43:20 2021 by root via crm_resource on atihome
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ atihome pihome ]

Full List of Resources:
  * DnsIP       (ocf::heartbeat:IPaddr2):        Started atihome
  * bind9       (service:named):         Started atihome

Service fail

It can happen that service fails on a node. Depending from migration-threshold and restart option more things can happen:

If restart option is never, it do nothing but move
If restart option is, at least, on-failure and threhsold is not reached, it will be restarted locally
If threhsold reached, then it is moved to another node

I made a failure by stopping bind9 outside of crm. It detected the error at the next monitor interval and restarted in place. It also put an info message about it in status output:

crm(live/atihome)# status
Cluster Summary:
  * Stack: corosync
  * Current DC: atihome (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Sun Dec  5 18:43:21 2021
  * Last change:  Sun Dec  5 18:43:20 2021 by root via crm_resource on atihome
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ atihome pihome ]

Full List of Resources:
  * DnsIP       (ocf::heartbeat:IPaddr2):        Started atihome
  * bind9       (service:named):         Started atihome

Failed Resource Actions:
  * bind9_monitor_30000 on atihome 'not running' (7): call=45, status='complete', exitreason='', last-rc-change='2021-12-05 17:05:46 +01:00', queued=0ms, exec=0ms

System fail

I stopped corosync and pacemaker manually on main server, thus making a disaster test. Resources ahs been moved to another node:

crm(live/pihome)# status
Cluster Summary:
  * Stack: corosync
  * Current DC: pihome (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Sun Dec  5 18:48:28 2021
  * Last change:  Sun Dec  5 18:43:20 2021 by root via crm_resource on atihome
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ pihome ]
  * OFFLINE: [ atihome ]

Full List of Resources:
  * DnsIP       (ocf::heartbeat:IPaddr2):        Started pihome
  * bind9       (service:named):         Started pihome

After starting corosync and pacemaker back, resources moved back to atihome:

crm(live/pihome)# status
Cluster Summary:
  * Stack: corosync
  * Current DC: pihome (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Sun Dec  5 18:50:09 2021
  * Last change:  Sun Dec  5 18:43:20 2021 by root via crm_resource on atihome
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ atihome pihome ]

Full List of Resources:
  * DnsIP       (ocf::heartbeat:IPaddr2):        Started atihome
  * bind9       (service:named):         Started atihome

There is anotehr function called fence, which can do thing when a node is out of the cluster. For example, in this case, system was still alive while not in cluster, it can be stopped or reboot by another node to make sure “dead remains dead” and not causing issue.

But fence is not configured yet. So cluster engine believes that those system, what cannot see, are stopped totally. It can be an issue in production systems and could cause consystency issues, but I have not configured fence mechanism in my home lab.

Put node to maintenance

It can happen, that I am making maintenance on a node and it is not desired that crm would do anything there. For this, we can put systems or a resource into maintenance mode. In maintenance mode, crm is not managaing the affected resources.

Resources are handled via resoruce sub-command. It can mainteance parameter: resource maintenance <resource> on/off.

crm(live/atihome)# resource maintenance bind9 on
crm(live/atihome)# status
Cluster Summary:
  * Stack: corosync
  * Current DC: pihome (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Sun Dec  5 19:05:22 2021
  * Last change:  Sun Dec  5 19:05:15 2021 by root via cibadmin on atihome
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ atihome pihome ]

Full List of Resources:
  * DnsIP       (ocf::heartbeat:IPaddr2):        Started atihome
  * bind9       (service:named):         Started atihome (unmanaged)

crm(live/atihome)# resource maintenance bind9 off
crm(live/atihome)# status
Cluster Summary:
  * Stack: corosync
  * Current DC: pihome (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Sun Dec  5 19:05:40 2021
  * Last change:  Sun Dec  5 19:05:39 2021 by root via cibadmin on atihome
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ atihome pihome ]

Full List of Resources:
  * DnsIP       (ocf::heartbeat:IPaddr2):        Started atihome
  * bind9       (service:named):         Started atihome

Working with nodes has a bit different syntax. node maintenance <name> out it into maintenance mode and node ready <name> will undo the maintancen mode:

crm(live/atihome)# node maintenance atihome
'maintenance' attribute already exists in bind9. Remove it (y/n)? y
crm(live/atihome)# status
Cluster Summary:
  * Stack: corosync
  * Current DC: pihome (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Sun Dec  5 19:03:03 2021
  * Last change:  Sun Dec  5 19:03:00 2021 by root via cibadmin on atihome
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Node atihome: maintenance
  * Online: [ pihome ]

Full List of Resources:
  * DnsIP       (ocf::heartbeat:IPaddr2):        Started atihome (unmanaged)
  * bind9       (service:named):         Started atihome (unmanaged)

crm(live/atihome)# node ready atihome
crm(live/atihome)# status
Cluster Summary:
  * Stack: corosync
  * Current DC: pihome (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Sun Dec  5 19:03:31 2021
  * Last change:  Sun Dec  5 19:03:29 2021 by root via crm_attribute on atihome
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ atihome pihome ]

Full List of Resources:
  * DnsIP       (ocf::heartbeat:IPaddr2):        Started atihome
  * bind9       (service:named):         Started atihome

As it can be seen, each resource which, was running on atihome, inherited maintenance mode.

Final words

Handling resources is simply, at least for now. Command line interface also seems handy after a few hour testing and practice.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Make Linux cluster! – Work and test resources

Manipulate resources

Planned move

Service fail

System fail

Put node to maintenance

Final words

You may also like...

Simple data store for my shell scripts

Make Linux cluster! – Pitfalls and observations

Make a Linux cluster! – Configure resources