Today in this post we are going to learn one of Linux cluster features which prevent cluster resource to automatic migrate on recent failure node which just recover from its stat means this will Prevent Resources from Moving after Recovery of node. Could also mention as prevent resource auto-fall-back.

Every cluster resource migration need some time to start on new node which cost some downtime, it could be smaller for some simple service but could way longer for some complex services and could cost a lot if its continually used through client like database where client has continuous connection towards its cluster Server for writing and reading database, so nobody like to migrate its cluster healthy resource between cluster nodes.

We always like to stick cluster healthy resource on any node in its running stat and don’t migrate to another node until has some issues while running on same node. For same Pacemaker has one concept resource stickness which actually controls how strongly a service prefers to running where it is.

So Let’s see how this concept work in cluster. To demonstrate this we have cluster which has some resources which combine to make two resource group.

SetUP

For this setup, we are using CentOS 6.10 Machine. This setup would be same as in CentOS7 as well.

[root@srv1 ~]# cat /etc/redhat-release 
CentOS release 6.10 (Final)

[root@srv1 ~]# uname -r
2.6.32-754.10.1.el6.x86_64

[root@srv1 ~]# pacemakerd -$
Pacemaker 1.1.18-3.el6
Written by Andrew Beekhof

[root@srv1 ~]# corosync -v
Corosync Cluster Engine, version '1.4.7'
Copyright (c) 2006-2009 Red Hat, Inc.

[root@srv1 ~]# cman_tool -V
cman_tool 3.0.12.1 (built Mar 24 2017 12:40:50)
Copyright (C) Red Hat, Inc.  2004-2010  All rights reserved.

[root@srv1 ~]# pcs --version
0.9.155

Above are version of software installed on my machine and used to configure cluster on Machine.

Let’s see how this working on this cluster.

Below is cluster resources on two nodes name srv1 and srv2.

[root@srv1 ~]# pcs resource
 Resource Group: MySQL
     MySQL_vip	(ocf::heartbeat:IPaddr2):	Started srv1
     MySQL_srv	(ocf::heartbeat:mysql):	Started srv1
 Resource Group: NFS
     nfs_vip	(ocf::heartbeat:IPaddr2):	Started srv2

Below is detail cluster status through pcs status

[root@srv1 ~]# pcs status
Cluster name: srv_cluster
Stack: cman
Current DC: srv1 (version 1.1.18-3.el6-bfe4e80420) - partition with quorum
Last updated: Sun Feb 24 10:23:48 2019
Last change: Sun Feb 24 09:53:59 2019 by root via cibadmin on srv1

2 nodes configured
3 resources configured

Online: [ srv1 srv2 ]

Full list of resources:

 Resource Group: MySQL
     MySQL_vip	(ocf::heartbeat:IPaddr2):	Started srv1
     MySQL_srv	(ocf::heartbeat:mysql):	Started srv1
 Resource Group: NFS
     nfs_vip	(ocf::heartbeat:IPaddr2):	Started srv2

So to configure stickiness on already configured resource, we need to run below command.

pcs resource defaults resource-stickiness=100

We could also use to manage stickiness on resource levels as well like below.

pcs resource redit MySQL_vip meta resource-stickiness=100

So we have configure stickiness on every resource with changing default meta attribute to stickiness value to 100, this is like cost of downtime to which it a healthy resource could move from one node to another which prevent to fallback after node recovery.

Let’s try to understand this examples in more practical scenario. So we have two resources configure on this group both has 100 stickiness and this group is running on srv1 node. SO if for some reason srv1 reboot then it should move to srv2, but once srv1 is boot-up again, resource should move back to srv1, with stickiness value it would not move back to srv1 as its healthy resource and stickiness make it stick on same node where it is currently located.

Let’s try to reboot srv1 and see result of same

[root@srv2 ~]# pcs status
Cluster name: srv_cluster
Stack: cman
Current DC: srv2 (version 1.1.18-3.el6-bfe4e80420) - partition with quorum
Last updated: Sun Feb 24 14:44:04 2019
Last change: Sun Feb 24 14:28:22 2019 by root via crm_attribute on srv1

2 nodes configured
3 resources configured

Online: [ srv2 ]
OFFLINE: [ srv1 ]

Full list of resources:

 Resource Group: MySQL
     MySQL_vip	(ocf::heartbeat:IPaddr2):	Started srv2
     MySQL_srv	(ocf::heartbeat:mysql):	Started srv2
 Resource Group: NFS
     srv_vip	(ocf::heartbeat:IPaddr2):	Started srv2

So once e reboot srv1 machine, all resource move to srv2 and get stick on it.

But once we boot srv1 machine again, no movement of resource to recover machine and all resource remain on srv2 only

[root@srv2 ~]# pcs status
Cluster name: srv_cluster
Stack: cman
Current DC: srv2 (version 1.1.18-3.el6-bfe4e80420) - partition with quorum
Last updated: Sun Feb 24 14:45:47 2019
Last change: Sun Feb 24 14:28:22 2019 by root via crm_attribute on srv1

2 nodes configured
3 resources configured

Online: [ srv1 srv2 ]

Full list of resources:

 Resource Group: MySQL
     MySQL_vip	(ocf::heartbeat:IPaddr2):	Started srv2
     MySQL_srv	(ocf::heartbeat:mysql):	Started srv2
 Resource Group: NFS
     srv_vip	(ocf::heartbeat:IPaddr2):	Started srv2

So you saw , even when srv1 is online, srv2 has all its cluster resources and no resource is moved to srv1. This makes ensure that resource fallback after node comes up will not effect through stickiness. We always have healthy resource where it lies, only manual movement require in such case to move resource on spare node when we allow to tolerate downtime through this movement,like in our case.

pcs resource move MySQL srv1

That move resource to srv1 node

[root@srv1 ~]# pcs status
Cluster name: srv_cluster
Stack: cman
Current DC: srv2 (version 1.1.18-3.el6-bfe4e80420) - partition with quorum
Last updated: Sun Feb 24 18:15:33 2019
Last change: Sun Feb 24 18:15:10 2019 by root via crm_resource on srv1

2 nodes configured
3 resources configured

Online: [ srv1 srv2 ]

Full list of resources:

 Resource Group: MySQL
     MySQL_vip	(ocf::heartbeat:IPaddr2):	Started srv1
     MySQL_srv	(ocf::heartbeat:mysql):	Started srv1
 Resource Group: NFS
     srv_vip	(ocf::heartbeat:IPaddr2):	Started srv2

That also create one constraint in cluster like below.

[root@srv1 ~]# pcs constraint show
Location Constraints:
  Resource: MySQL
    Enabled on: srv1 (score:INFINITY) (role: Started)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Note: We should always clear this constraint like below.

[root@srv1 ~]# pcs resource clear MySQL
[root@srv1 ~]# pcs constraint show
Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints: