Discussion:
[Pacemaker] migration-threshold causing unnecessary restart of underlying resources
Cnut Jansen
2010-08-12 02:12:02 UTC
Permalink
Hi,

I'm once again experiencing (imho) strange behaviour respectively
decision-making by Pacemaker, and I hope that someone can either
enlighten me a little about this, its intention and/or a possible
misconfiguration or something, or confirm it a possible bug.

Basically I have a cluster of 2 nodes with cloned DLM-, O2CB-, DRBD-,
mount-resources, and a MySQL-resource (grouped with an IPaddr-resource)
running on top of the other ones.
The MySQL(-group)-resource depends on the mount-resource, which depends
on both, the DRBD- and the O2CB-resources equally, and the O2CB-resource
depends on the DLM-resource.
cloneDlm -> cloneO2cb -\
}-> cloneMountMysql -> mysql / grpMysql( mysql
-> ipMysql )
msDrbdMysql -----------/
Furthermore for the MySQL(-group)-resource I set meta-attributes
"migration-threshold=1" and "failure-timeout=90" (later also tried
settings "3" and "130" for these).

Now I picked a little on mysql using "crm_resource -F -r mysql -H
<node>", expecting that only mysql respectively its group (tested both
configurations; same result) would be stopped (and moved over to the
other node).
But actually not only mysql/grpMysql was stopped, but also the mount-
and even the DRBD-resources were stopped, and upon restarting them the
DRBD-resource was left as slave (thus the mount of course wasn't allowed
to restart either) and - back then before I set
cluster-recheck-interval=2m - didn't seem to even try to promote back to
master (didn't wait cluster-recheck-interval's default 15m).

Now through a lot of testing I found out that:
a) the stops/restarts of the underlying resources happen only when
failcounter hits the limit set by migration-threshold; i.e. when set to
3, on first 2 failures only mysql/grpMysql is restarted on the same node
and only on 3rd one underlying resources are left in a mess (while
mysql/grpMysql migrates) (for DRBD reproducable; unsure about
DLM/O2CB-side, but there's sometimes hard trouble too after having
picked on mysql; just couldn't definitively link it yet)
b) upon causing mysql/grpMysql's migration, score for
msDrbdMysql:promote changes from 10020 to -inf and stays there for the
time of mysql/grpMysql's failure-timeout (proved with also setting to
130), before it rises back up to 10000
c) msDrbdMysql remains slave until the next cluster-recheck after its
promote-score went back up to 10000
d) I also have the impression that fail-counters don't get reset after
their failure-timeout, because when migration-threshold=3 is set, upon
every(!) following picking-on those issues occure, even when I've waited
for nearly 5 minutes (with failure-timeout=90) without any touching the
cluster

I experienced this on both test-clusters, a SLES 11 HAE SP1 with
Pacemaker 1.1.2, and a Debian Squeeze with Pacemaker 1.0.9. When
migration-threshold for mysql/grpMysql is removed, everything is fine
(except no migration of course). I can't remember such happening with
SLES 11 HAE SP0's Pacemaker 1.0.6.

I'd really appreciate any comment and/or enlightment about what's the
deal with this. (-;


p.s.: Just for fun / testing / proving I just also contrainted
grpLdirector to cloneMountShared... and could perfectly reproduce that
problem with its then underlying resources too.

================================================================================

2) mysql: meta migration-threshold=1 failure-timeout=130 ->
drbd:promote erst nach 130sek score-technisch wieder m?glich
nde34:~ # nd=nde35;cl=1;failcmd="crm_resource -F -r mysql -H $nd" ; date
; ptest -sL | grep "drbdMysql:$cl promotion score on $nd" ; date ; echo
$failcmd; $failcmd ; date ; ptest -sL | grep "drbdMysql:$cl promotion
score on $nd" ; sleep 85 ; while [ true ]; do date ; ptest -sL | grep
"drbdMysql:$cl promotion score on $nd" ; sleep 5; done
Wed Aug 11 15:33:04 CEST 2010
drbdMysql:1 promotion score on nde35: 10020
drbdMysql:1 promotion score on nde35: INFINITY
drbdMysql:1 promotion score on nde35: INFINITY
Wed Aug 11 15:33:04 CEST 2010
crm_resource -F -r mysql -H nde35
Wed Aug 11 15:33:05 CEST 2010
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
Wed Aug 11 15:34:31 CEST 2010
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
[...]
Wed Aug 11 15:35:11 CEST 2010
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
Wed Aug 11 15:35:16 CEST 2010
drbdMysql:1 promotion score on nde35: 10000
drbdMysql:1 promotion score on nde35: INFINITY
drbdMysql:1 promotion score on nde35: INFINITY
^C


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cluster-conf - sles11sp1.txt
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20100812/023d2b81/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cluster-conf - squeeze.txt
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20100812/023d2b81/attachment-0003.txt>
Dejan Muhamedagic
2010-08-12 16:46:17 UTC
Permalink
Hi,
Post by Cnut Jansen
Hi,
I'm once again experiencing (imho) strange behaviour respectively
decision-making by Pacemaker, and I hope that someone can either
enlighten me a little about this, its intention and/or a possible
misconfiguration or something, or confirm it a possible bug.
Basically I have a cluster of 2 nodes with cloned DLM-, O2CB-,
DRBD-, mount-resources, and a MySQL-resource (grouped with an
IPaddr-resource) running on top of the other ones.
The MySQL(-group)-resource depends on the mount-resource, which
depends on both, the DRBD- and the O2CB-resources equally, and the
O2CB-resource depends on the DLM-resource.
cloneDlm -> cloneO2cb -\
}-> cloneMountMysql -> mysql / grpMysql(
mysql -> ipMysql )
msDrbdMysql -----------/
Furthermore for the MySQL(-group)-resource I set meta-attributes
"migration-threshold=1" and "failure-timeout=90" (later also tried
settings "3" and "130" for these).
Now I picked a little on mysql using "crm_resource -F -r mysql -H
<node>", expecting that only mysql respectively its group (tested
both configurations; same result) would be stopped (and moved over
to the other node).
But actually not only mysql/grpMysql was stopped, but also the
mount- and even the DRBD-resources were stopped, and upon restarting
them the DRBD-resource was left as slave (thus the mount of course
wasn't allowed to restart either) and - back then before I set
cluster-recheck-interval=2m - didn't seem to even try to promote
back to master (didn't wait cluster-recheck-interval's default 15m).
a) the stops/restarts of the underlying resources happen only when
failcounter hits the limit set by migration-threshold; i.e. when set
to 3, on first 2 failures only mysql/grpMysql is restarted on the
same node and only on 3rd one underlying resources are left in a
mess (while mysql/grpMysql migrates) (for DRBD reproducable; unsure
about DLM/O2CB-side, but there's sometimes hard trouble too after
having picked on mysql; just couldn't definitively link it yet)
The migration-threshold shouldn't in any way influence resources
which don't depend on the resource which fails over. Couldn't
reproduce it here with our example RAs.

BTW, what's the point of cloneMountMysql? If it can run only
where drbd is master, then it can run on one node only:

colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
order orderMountMysql_drbd inf: msDrbdMysql:promote cloneMountMysql:start

Right? At least that's how it behaves here, with the tip of
1.1.2.
Post by Cnut Jansen
b) upon causing mysql/grpMysql's migration, score for
msDrbdMysql:promote changes from 10020 to -inf and stays there for
the time of mysql/grpMysql's failure-timeout (proved with also
setting to 130), before it rises back up to 10000
c) msDrbdMysql remains slave until the next cluster-recheck after
its promote-score went back up to 10000
d) I also have the impression that fail-counters don't get reset
after their failure-timeout, because when migration-threshold=3 is
set, upon every(!) following picking-on those issues occure, even
when I've waited for nearly 5 minutes (with failure-timeout=90)
without any touching the cluster
That seems to be a bug though I couldn't reproduce it with a
simple configuration.

Thanks,

Dejan
Post by Cnut Jansen
I experienced this on both test-clusters, a SLES 11 HAE SP1 with
Pacemaker 1.1.2, and a Debian Squeeze with Pacemaker 1.0.9. When
migration-threshold for mysql/grpMysql is removed, everything is
fine (except no migration of course). I can't remember such
happening with SLES 11 HAE SP0's Pacemaker 1.0.6.
I'd really appreciate any comment and/or enlightment about what's
the deal with this. (-;
p.s.: Just for fun / testing / proving I just also contrainted
grpLdirector to cloneMountShared... and could perfectly reproduce
that problem with its then underlying resources too.
================================================================================
2) mysql: meta migration-threshold=1 failure-timeout=130 ->
drbd:promote erst nach 130sek score-technisch wieder m?glich
nde34:~ # nd=nde35;cl=1;failcmd="crm_resource -F -r mysql -H $nd" ;
date ; ptest -sL | grep "drbdMysql:$cl promotion score on $nd" ;
date ; echo $failcmd; $failcmd ; date ; ptest -sL | grep
"drbdMysql:$cl promotion score on $nd" ; sleep 85 ; while [ true ];
do date ; ptest -sL | grep "drbdMysql:$cl promotion score on $nd" ;
sleep 5; done
Wed Aug 11 15:33:04 CEST 2010
drbdMysql:1 promotion score on nde35: 10020
drbdMysql:1 promotion score on nde35: INFINITY
drbdMysql:1 promotion score on nde35: INFINITY
Wed Aug 11 15:33:04 CEST 2010
crm_resource -F -r mysql -H nde35
Wed Aug 11 15:33:05 CEST 2010
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
Wed Aug 11 15:34:31 CEST 2010
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
[...]
Wed Aug 11 15:35:11 CEST 2010
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
drbdMysql:1 promotion score on nde35: -INFINITY
Wed Aug 11 15:35:16 CEST 2010
drbdMysql:1 promotion score on nde35: 10000
drbdMysql:1 promotion score on nde35: INFINITY
drbdMysql:1 promotion score on nde35: INFINITY
^C
node nde34 \
node nde35 \
primitive apache ocf:cj:apache \
primitive dlm ocf:pacemaker:controld \
primitive drbdMysql ocf:linbit:drbd \
primitive drbdOpencms ocf:linbit:drbd \
primitive drbdShared ocf:linbit:drbd \
primitive ipLdirector ocf:heartbeat:IPaddr2 \
primitive ipMysql ocf:heartbeat:IPaddr \
primitive ldirector ocf:heartbeat:ldirectord \
primitive mountMysql ocf:heartbeat:Filesystem \
primitive mountOpencms ocf:heartbeat:Filesystem \
primitive mountShared ocf:heartbeat:Filesystem \
primitive mysql ocf:heartbeat:mysql \
primitive o2cb ocf:ocfs2:o2cb \
primitive tomcat ocf:cj:tomcat \
group grpLdirector ldirector ipLdirector \
group grpMysql mysql ipMysql \
ms msDrbdMysql drbdMysql \
ms msDrbdOpencms drbdOpencms \
ms msDrbdShared drbdShared \
clone cloneApache apache
clone cloneDlm dlm \
clone cloneMountMysql mountMysql \
clone cloneMountOpencms mountOpencms \
clone cloneMountShared mountShared \
clone cloneO2cb o2cb \
clone cloneTomcat tomcat \
colocation colocApache inf: cloneApache cloneTomcat
colocation colocGrpLdirector inf: grpLdirector cloneMountShared
colocation colocGrpMysql inf: grpMysql cloneMountMysql
colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb
colocation colocMountOpencms_drbd inf: cloneMountOpencms msDrbdOpencms:Master
colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb
colocation colocMountShared_drbd inf: cloneMountShared msDrbdShared:Master
colocation colocMountShared_o2cb inf: cloneMountShared cloneO2cb
colocation colocO2cb inf: cloneO2cb cloneDlm
colocation colocTomcat inf: cloneTomcat cloneMountOpencms
order orderApache inf: cloneTomcat cloneApache
order orderGrpLdirector inf: cloneMountShared grpLdirector
order orderGrpMysql inf: cloneMountMysql grpMysql
order orderMountMysql_drbd inf: msDrbdMysql:promote cloneMountMysql:start
order orderMountMysql_o2cb inf: cloneO2cb cloneMountMysql
order orderMountOpencms_drbd inf: msDrbdOpencms:promote cloneMountOpencms:start
order orderMountOpencms_o2cb inf: cloneO2cb cloneMountOpencms
order orderMountShared_drbd inf: msDrbdShared:promote cloneMountShared:start
order orderMountShared_o2cb inf: cloneO2cb cloneMountShared
order orderO2cb inf: cloneDlm cloneO2cb
order orderTomcat inf: cloneMountOpencms cloneTomcat
property $id="cib-bootstrap-options" \
dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
start-failure-is-fatal="false" \
cluster-recheck-interval="5m" \
shutdown-escalation="5m" \
last-lrm-refresh="1281543643"
rsc_defaults $id="rsc-options" \
resource-stickiness="5"
node alpha \
attributes standby="off"
node beta \
attributes standby="off"
primitive dlm ocf:pacemaker:controld \
op monitor interval="10" timeout="20" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100"
primitive drbdShared ocf:linbit:drbd \
params drbd_resource="shared" \
op monitor interval="10" role="Master" timeout="20" \
op monitor interval="20" role="Slave" timeout="20" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="100" \
op promote interval="0" timeout="90" \
op demote interval="0" timeout="90" \
op notify interval="0" timeout="90"
primitive ipMysql ocf:heartbeat:IPaddr \
params ip="192.168.135.67" cidr_netmask="255.255.0.0" \
op monitor interval="2" timeout="20" \
op start interval="0" timeout="90"
primitive mountShared ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/shared" fstype="ocfs2" \
op monitor interval="10" timeout="40" OCF_CHECK_LEVEL="10" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60"
primitive mysql ocf:heartbeat:mysql \
params binary="/usr/bin/mysqld_safe" config="/var/lib/mysql/my.cnf" pid="/var/run/mysqld/mysqld.pid" socket="/var/lib/mysql/mysqld.sock" test_table="ha.check" test_user="HAuser" test_passwd="HApass" \
op monitor interval="10" timeout="30" OCF_CHECK_LEVEL="0" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120"
primitive o2cb ocf:pacemaker:o2cb \
op monitor interval="10" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100"
group grpMysql mysql ipMysql \
meta migration-threshold="3" failure-timeout="30"
ms msDrbdShared drbdShared \
meta resource-stickiness="100" notify="true" master-max="2"
clone cloneDlm dlm \
meta globally-unique="false" interleave="true"
clone cloneMountShared mountShared \
meta interleave="true" globally-unique="false" target-role="Started"
clone cloneO2cb o2cb \
meta globally-unique="false" interleave="true" target-role="Started"
colocation colocMountShared_drbd inf: cloneMountShared msDrbdShared:Master
colocation colocMountShared_o2cb inf: cloneMountShared cloneO2cb
colocation colocMysql inf: grpMysql cloneMountShared
colocation colocO2cb inf: cloneO2cb cloneDlm
order orderMountShared_drbd inf: msDrbdShared:promote cloneMountShared:start
order orderMountShared_o2cb inf: cloneO2cb cloneMountShared
order orderMysql inf: cloneMountShared grpMysql
order orderO2cb inf: cloneDlm cloneO2cb
property $id="cib-bootstrap-options" \
dc-version="1.0.9-unknown" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
start-failure-is-fatal="false" \
last-lrm-refresh="1281577809" \
cluster-recheck-interval="4m" \
shutdown-escalation="5m"
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Cnut Jansen
2010-08-14 04:26:58 UTC
Permalink
Hi,

and first of all thanks for answering so far.
Post by Dejan Muhamedagic
The migration-threshold shouldn't in any way influence resources
which don't depend on the resource which fails over. Couldn't
reproduce it here with our example RAs.
Well, I now - just to clearly assure that something's wrong there;
whatever it is, simple misconfiguration or possible bug - did crm
configure erase, completely restarted both nodes, and then setup this
new, very simple, dummy-based configuration:
v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v
v v v v
node alpha \
attributes standby="off"
node beta \
attributes standby="off"
primitive dlm ocf:heartbeat:Dummy
primitive drbd ocf:heartbeat:Dummy
primitive mount ocf:heartbeat:Dummy
primitive mysql ocf:heartbeat:Dummy \
meta migration-threshold="3" failure-timeout="40"
primitive o2cb ocf:heartbeat:Dummy
location cli-prefer-mount mount \
rule $id="cli-prefer-rule-mount" inf: #uname eq alpha
colocation colocMysql inf: mysql mount
order orderMysql inf: mount mysql
property $id="cib-bootstrap-options" \
dc-version="1.0.9-unknown" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
cluster-recheck-interval="150" \
last-lrm-refresh="1281751924"
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
^ ^ ^ ^
...and then, with picking on the resource "mysql", got this:

1) alpha: FC(mysql)=0, crm_resource -F -r mysql -H alpha
Aug 14 04:15:30 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_asyncmon_0 (call=48, rc=1, cib-update=563,
confirmed=false) unknown error
Aug 14 04:15:30 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=49, rc=0, cib-update=565, confirmed=true) ok
Aug 14 04:15:30 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_start_0 (call=50, rc=0, cib-update=567, confirmed=true) ok

2) alpha: FC(mysql)=1, crm_resource -F -r mysql -H alpha
Aug 14 04:15:42 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_asyncmon_0 (call=51, rc=1, cib-update=568,
confirmed=false) unknown error
Aug 14 04:15:42 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=52, rc=0, cib-update=572, confirmed=true) ok
Aug 14 04:15:42 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_start_0 (call=53, rc=0, cib-update=573, confirmed=true) ok

3) alpha: FC(mysql)=2, crm_resource -F -r mysql -H alpha
Aug 14 04:15:56 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_asyncmon_0 (call=54, rc=1, cib-update=574,
confirmed=false) unknown error
Aug 14 04:15:56 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=55, rc=0, cib-update=576, confirmed=true) ok
Aug 14 04:15:56 alpha crmd: [900]: info: process_lrm_event: LRM
operation mount_stop_0 (call=56, rc=0, cib-update=578, confirmed=true) ok
beta: (FC(mysql)=3
Aug 14 04:15:56 beta crmd: [868]: info: process_lrm_event: LRM operation
mount_start_0 (call=36, rc=0, cib-update=92, confirmed=true) ok
Aug 14 04:15:56 beta crmd: [868]: info: process_lrm_event: LRM operation
mysql_start_0 (call=37, rc=0, cib-update=93, confirmed=true) ok
Aug 14 04:18:26 beta crmd: [868]: info: process_lrm_event: LRM operation
mysql_stop_0 (call=38, rc=0, cib-update=94, confirmed=true) ok
Aug 14 04:18:26 beta crmd: [868]: info: process_lrm_event: LRM operation
mount_stop_0 (call=39, rc=0, cib-update=95, confirmed=true) ok
alpha: FC(mysql)=3
Aug 14 04:18:26 alpha crmd: [900]: info: process_lrm_event: LRM
operation mount_start_0 (call=57, rc=0, cib-update=580, confirmed=true) ok
Aug 14 04:18:26 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_start_0 (call=58, rc=0, cib-update=581, confirmed=true) ok


So it seems that - for what reason ever - those constrainted resources
are considered and treated just as they were in a resource-group,
because they move to where they all can run, instead of the "eat or die"
for the dependent resource (mysql) to the underlying resource (mount)
that I had expected with such constraints as I set them... shouldn't I?! o_O


And - concerning the failure-timeout - quite a while later, without
having resetted mysql's failure counter or having done anything else in
the meantime:

4) alpha: FC(mysql)=3, crm_resource -F -r mysql -H alpha
Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_asyncmon_0 (call=59, rc=1, cib-update=592,
confirmed=false) unknown error
Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=60, rc=0, cib-update=596, confirmed=true) ok
Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
operation mount_stop_0 (call=61, rc=0, cib-update=597, confirmed=true) ok
beta: FC(mysql)=0
Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM operation
mount_start_0 (call=40, rc=0, cib-update=96, confirmed=true) ok
Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM operation
mysql_start_0 (call=41, rc=0, cib-update=97, confirmed=true) ok
Aug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM operation
mysql_stop_0 (call=42, rc=0, cib-update=98, confirmed=true) ok
Aug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM operation
mount_stop_0 (call=43, rc=0, cib-update=99, confirmed=true) ok
alpha: FC(mysql)=4
Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRM
operation mount_start_0 (call=62, rc=0, cib-update=599, confirmed=true) ok
Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_start_0 (call=63, rc=0, cib-update=600, confirmed=true) ok
Post by Dejan Muhamedagic
BTW, what's the point of cloneMountMysql? If it can run only
colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
order orderMountMysql_drbd inf: msDrbdMysql:promote cloneMountMysql:start
It's a dual-primary-DRBD-configuration, so there are actually - when
everything is ok (-; - 2 masters of each DRBD-multistate-resource...
even though I admit that at least the dual primary respectively master
for msDrbdMysql is currently (quite) redundant, since in the current
cluster configuration there's only one, primitive MySQL-resource and
thus there'd be no inevitable need for MySQL's data-dir being mounted
all time on both nodes.
But since it's not harmful to have it mounted on the other node too, and
since msDrbdOpencms and msDrbdShared need to be mounted on both nodes
and since I put the complete installation and configuration of the
cluster into flexibly configurable shell-scripts, it's easier
respectively done with less typing to just put all DRBD- and
mount-resources' configuration into just one common loop. (-;
Post by Dejan Muhamedagic
Post by Cnut Jansen
d) I also have the impression that fail-counters don't get reset
after their failure-timeout, because when migration-threshold=3 is
set, upon every(!) following picking-on those issues occure, even
when I've waited for nearly 5 minutes (with failure-timeout=90)
without any touching the cluster
That seems to be a bug though I couldn't reproduce it with a
simple configuration.
I just also tested this once again: It seems like that failure-timeout
only sets back scores from -inf to around 0 (whereever they should
normally be), allowing the resources to return back to the node. I
tested with setting a location constraint for the underlying resource
(see configuration): After the failure-timeout has been completed, on
the next cluster-recheck (and only then!) the underlying resource and
its dependants return to the underlying resource's prefered location, as
you see in logs above.
Dejan Muhamedagic
2010-08-16 11:29:05 UTC
Permalink
Hi,
Post by Cnut Jansen
Hi,
and first of all thanks for answering so far.
Post by Dejan Muhamedagic
The migration-threshold shouldn't in any way influence resources
which don't depend on the resource which fails over. Couldn't
reproduce it here with our example RAs.
Well, I now - just to clearly assure that something's wrong there;
whatever it is, simple misconfiguration or possible bug - did crm
configure erase, completely restarted both nodes, and then setup
v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v
v v v v v v
node alpha \
attributes standby="off"
node beta \
attributes standby="off"
primitive dlm ocf:heartbeat:Dummy
primitive drbd ocf:heartbeat:Dummy
primitive mount ocf:heartbeat:Dummy
primitive mysql ocf:heartbeat:Dummy \
meta migration-threshold="3" failure-timeout="40"
primitive o2cb ocf:heartbeat:Dummy
location cli-prefer-mount mount \
rule $id="cli-prefer-rule-mount" inf: #uname eq alpha
colocation colocMysql inf: mysql mount
order orderMysql inf: mount mysql
property $id="cib-bootstrap-options" \
dc-version="1.0.9-unknown" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
cluster-recheck-interval="150" \
last-lrm-refresh="1281751924"
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^ ^
1) alpha: FC(mysql)=0, crm_resource -F -r mysql -H alpha
Aug 14 04:15:30 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_asyncmon_0 (call=48, rc=1, cib-update=563,
confirmed=false) unknown error
Aug 14 04:15:30 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=49, rc=0, cib-update=565,
confirmed=true) ok
Aug 14 04:15:30 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_start_0 (call=50, rc=0, cib-update=567,
confirmed=true) ok
2) alpha: FC(mysql)=1, crm_resource -F -r mysql -H alpha
Aug 14 04:15:42 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_asyncmon_0 (call=51, rc=1, cib-update=568,
confirmed=false) unknown error
Aug 14 04:15:42 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=52, rc=0, cib-update=572,
confirmed=true) ok
Aug 14 04:15:42 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_start_0 (call=53, rc=0, cib-update=573,
confirmed=true) ok
3) alpha: FC(mysql)=2, crm_resource -F -r mysql -H alpha
Aug 14 04:15:56 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_asyncmon_0 (call=54, rc=1, cib-update=574,
confirmed=false) unknown error
Aug 14 04:15:56 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=55, rc=0, cib-update=576,
confirmed=true) ok
Aug 14 04:15:56 alpha crmd: [900]: info: process_lrm_event: LRM
operation mount_stop_0 (call=56, rc=0, cib-update=578,
confirmed=true) ok
beta: (FC(mysql)=3
Aug 14 04:15:56 beta crmd: [868]: info: process_lrm_event: LRM
operation mount_start_0 (call=36, rc=0, cib-update=92,
confirmed=true) ok
Aug 14 04:15:56 beta crmd: [868]: info: process_lrm_event: LRM
operation mysql_start_0 (call=37, rc=0, cib-update=93,
confirmed=true) ok
Aug 14 04:18:26 beta crmd: [868]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=38, rc=0, cib-update=94,
confirmed=true) ok
Aug 14 04:18:26 beta crmd: [868]: info: process_lrm_event: LRM
operation mount_stop_0 (call=39, rc=0, cib-update=95,
confirmed=true) ok
alpha: FC(mysql)=3
Aug 14 04:18:26 alpha crmd: [900]: info: process_lrm_event: LRM
operation mount_start_0 (call=57, rc=0, cib-update=580,
confirmed=true) ok
Aug 14 04:18:26 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_start_0 (call=58, rc=0, cib-update=581,
confirmed=true) ok
So it seems that - for what reason ever - those constrainted
resources are considered and treated just as they were in a
resource-group, because they move to where they all can run, instead
of the "eat or die" for the dependent resource (mysql) to the
underlying resource (mount) that I had expected with such
constraints as I set them... shouldn't I?! o_O
Yes, those two constraints are equivalent to a group.
Post by Cnut Jansen
And - concerning the failure-timeout - quite a while later, without
having resetted mysql's failure counter or having done anything else
4) alpha: FC(mysql)=3, crm_resource -F -r mysql -H alpha
Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_asyncmon_0 (call=59, rc=1, cib-update=592,
confirmed=false) unknown error
Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=60, rc=0, cib-update=596,
confirmed=true) ok
Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
operation mount_stop_0 (call=61, rc=0, cib-update=597,
confirmed=true) ok
beta: FC(mysql)=0
Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM
operation mount_start_0 (call=40, rc=0, cib-update=96,
confirmed=true) ok
Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM
operation mysql_start_0 (call=41, rc=0, cib-update=97,
confirmed=true) ok
Aug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=42, rc=0, cib-update=98,
confirmed=true) ok
Aug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM
operation mount_stop_0 (call=43, rc=0, cib-update=99,
confirmed=true) ok
alpha: FC(mysql)=4
Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRM
operation mount_start_0 (call=62, rc=0, cib-update=599,
confirmed=true) ok
Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_start_0 (call=63, rc=0, cib-update=600,
confirmed=true) ok
This worked as expected, i.e. after the 150s cluster-recheck
interval the resources were started at alpha.
Post by Cnut Jansen
Post by Dejan Muhamedagic
BTW, what's the point of cloneMountMysql? If it can run only
colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
order orderMountMysql_drbd inf: msDrbdMysql:promote cloneMountMysql:start
It's a dual-primary-DRBD-configuration, so there are actually - when
everything is ok (-; - 2 masters of each DRBD-multistate-resource...
even though I admit that at least the dual primary respectively
master for msDrbdMysql is currently (quite) redundant, since in the
current cluster configuration there's only one, primitive
MySQL-resource and thus there'd be no inevitable need for MySQL's
data-dir being mounted all time on both nodes.
But since it's not harmful to have it mounted on the other node too,
and since msDrbdOpencms and msDrbdShared need to be mounted on both
nodes and since I put the complete installation and configuration of
the cluster into flexibly configurable shell-scripts, it's easier
respectively done with less typing to just put all DRBD- and
mount-resources' configuration into just one common loop. (-;
OK. It did cross my mind that it may be a dual-master drbd.

Your configuration is large. If you are going to run that in
producetion and don't really need a dual-master, then it'd be
good to get rid of the ocfs2 bits to make maintenance easier.
Post by Cnut Jansen
Post by Dejan Muhamedagic
Post by Cnut Jansen
d) I also have the impression that fail-counters don't get reset
after their failure-timeout, because when migration-threshold=3 is
set, upon every(!) following picking-on those issues occure, even
when I've waited for nearly 5 minutes (with failure-timeout=90)
without any touching the cluster
That seems to be a bug though I couldn't reproduce it with a
simple configuration.
I just also tested this once again: It seems like that
failure-timeout only sets back scores from -inf to around 0
(whereever they should normally be), allowing the resources to
return back to the node. I tested with setting a location constraint
for the underlying resource (see configuration): After the
failure-timeout has been completed, on the next cluster-recheck (and
only then!) the underlying resource and its dependants return to the
underlying resource's prefered location, as you see in logs above.
The count gets reset, but the cluster acts on it only after the
cluster-recheck-interval, unless something else makes the cluster
calculate new scores.

Thanks,

Dejan
Post by Cnut Jansen
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Cnut Jansen
2010-08-17 02:14:17 UTC
Permalink
Post by Dejan Muhamedagic
Post by Cnut Jansen
Post by Dejan Muhamedagic
The migration-threshold shouldn't in any way influence resources
which don't depend on the resource which fails over. Couldn't
reproduce it here with our example RAs.
So it seems that - for what reason ever - those constrainted
resources are considered and treated just as they were in a
resource-group, because they move to where they all can run, instead
of the "eat or die" for the dependent resource (mysql) to the
underlying resource (mount) that I had expected with such
constraints as I set them... shouldn't I?! o_O
Yes, those two constraints are equivalent to a group.
So in fact migration-threshold actually does influence resources that
are neither grouped with nor dependent on the failing resource, when the
failing resource depends on them?!

Of course I allready knew that from groups, and there it - imho - also
makes sense, since defining a group means like saying "I want to have
all these resources run together on one node; no matter how and where".
But when setting constraints respectively defining dependencies, at
least I understand "dependency" one-sided, not mutual; meaning the
underlying resource is independent towards its dependent, therefor it
can do whatever it wants to do and doesn't have to care about its
dependent at all, while the dependent shall only start when and where
the underlying resource it depends on is started.
So did I understand you right, that for Pacemaker it's actually the
intentional way of working for both, groups and constraints, that they
are mutual dependencies?

And if so: Is there also any possibility to define one-sided
dependencies/influences?
Post by Dejan Muhamedagic
Post by Cnut Jansen
And - concerning the failure-timeout - quite a while later, without
having resetted mysql's failure counter or having done anything else
4) alpha: FC(mysql)=3, crm_resource -F -r mysql -H alpha
Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_asyncmon_0 (call=59, rc=1, cib-update=592,
confirmed=false) unknown error
Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=60, rc=0, cib-update=596,
confirmed=true) ok
Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
operation mount_stop_0 (call=61, rc=0, cib-update=597,
confirmed=true) ok
beta: FC(mysql)=0
Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM
operation mount_start_0 (call=40, rc=0, cib-update=96,
confirmed=true) ok
Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM
operation mysql_start_0 (call=41, rc=0, cib-update=97,
confirmed=true) ok
Aug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=42, rc=0, cib-update=98,
confirmed=true) ok
Aug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM
operation mount_stop_0 (call=43, rc=0, cib-update=99,
confirmed=true) ok
alpha: FC(mysql)=4
Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRM
operation mount_start_0 (call=62, rc=0, cib-update=599,
confirmed=true) ok
Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_start_0 (call=63, rc=0, cib-update=600,
confirmed=true) ok
This worked as expected, i.e. after the 150s cluster-recheck
interval the resources were started at alpha.
Is it really "as exspected" that many(!) minutes - and even
cluster-rechecks - after the last picking-on and with a failure-timeout
of 45 seconds the failure counter is still not only showing a count of
3, but also obviously really being 3 (not 0, after being reset), thus
now migrating resource allready on the first following picking-on?!
Post by Dejan Muhamedagic
Post by Cnut Jansen
Post by Dejan Muhamedagic
BTW, what's the point of cloneMountMysql? If it can run only
colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
order orderMountMysql_drbd inf: msDrbdMysql:promote cloneMountMysql:start
It's a dual-primary-DRBD-configuration, so there are actually - when
everything is ok (-; - 2 masters of each DRBD-multistate-resource...
even though I admit that at least the dual primary respectively
master for msDrbdMysql is currently (quite) redundant, since in the
current cluster configuration there's only one, primitive
MySQL-resource and thus there'd be no inevitable need for MySQL's
data-dir being mounted all time on both nodes.
But since it's not harmful to have it mounted on the other node too,
and since msDrbdOpencms and msDrbdShared need to be mounted on both
nodes and since I put the complete installation and configuration of
the cluster into flexibly configurable shell-scripts, it's easier
respectively done with less typing to just put all DRBD- and
mount-resources' configuration into just one common loop. (-;
OK. It did cross my mind that it may be a dual-master drbd.
Your configuration is large. If you are going to run that in
producetion and don't really need a dual-master, then it'd be
good to get rid of the ocfs2 bits to make maintenance easier.
Well, there are 3 DRBD resources, and the other 2 DRBD resources except
the DRBD for MySQL's datadir must be dual-primary allready now, since
they're needed being mounted on all nodes for the
Apache/Tomcat/Opencms-teams. Therefor it's indeed easier for maintenance
to just keep all 3 DRBD's configurations in sync, and only requiring one
little line more for cloning mountMysql. (-;
Post by Dejan Muhamedagic
Post by Cnut Jansen
Post by Dejan Muhamedagic
Post by Cnut Jansen
d) I also have the impression that fail-counters don't get reset
after their failure-timeout, because when migration-threshold=3 is
set, upon every(!) following picking-on those issues occure, even
when I've waited for nearly 5 minutes (with failure-timeout=90)
without any touching the cluster
That seems to be a bug though I couldn't reproduce it with a
simple configuration.
I just also tested this once again: It seems like that
failure-timeout only sets back scores from -inf to around 0
(whereever they should normally be), allowing the resources to
return back to the node. I tested with setting a location constraint
for the underlying resource (see configuration): After the
failure-timeout has been completed, on the next cluster-recheck (and
only then!) the underlying resource and its dependants return to the
underlying resource's prefered location, as you see in logs above.
The count gets reset, but the cluster acts on it only after the
cluster-recheck-interval, unless something else makes the cluster
calculate new scores.
See above, picking-on #4: More than 26 minutes after the last picking-on
with settings of migration-threshold=3, timeout-failure=40 and
cluster-recheck-interval=150, resources get allready migrated upon first
picking-on (and shown failure-counter raises to 4). To me that doesn't
look like resetting failure-counter to 0 after failure-timeout, but just
resetting scores. Actually - except maybe by tricks/force - it shouldn't
be possible at all to get the resource running again on the node it
failed on for as long as its failure counter there has still reached
migration-threshold's limit, right?
How can then failure counter ever reach counts beyond
migration-threshold's limit (ok, I could still imagine reasons for that)
at all, and exspecially why does migration-threshold from then on behave
on every failure like it was set to 1, even when it's i.e. set to 3?
Dejan Muhamedagic
2010-08-17 11:51:14 UTC
Permalink
Post by Cnut Jansen
Post by Dejan Muhamedagic
Post by Cnut Jansen
Post by Dejan Muhamedagic
The migration-threshold shouldn't in any way influence resources
which don't depend on the resource which fails over. Couldn't
reproduce it here with our example RAs.
So it seems that - for what reason ever - those constrainted
resources are considered and treated just as they were in a
resource-group, because they move to where they all can run, instead
of the "eat or die" for the dependent resource (mysql) to the
underlying resource (mount) that I had expected with such
constraints as I set them... shouldn't I?! o_O
Yes, those two constraints are equivalent to a group.
So in fact migration-threshold actually does influence resources
that are neither grouped with nor dependent on the failing resource,
when the failing resource depends on them?!
Of course I allready knew that from groups, and there it - imho -
also makes sense, since defining a group means like saying "I want
to have all these resources run together on one node; no matter how
and where". But when setting constraints respectively defining
dependencies, at least I understand "dependency" one-sided, not
mutual; meaning the underlying resource is independent towards its
dependent, therefor it can do whatever it wants to do and doesn't
have to care about its dependent at all, while the dependent shall
only start when and where the underlying resource it depends on is
started.
So did I understand you right, that for Pacemaker it's actually the
intentional way of working for both, groups and constraints, that
they are mutual dependencies?
And if so: Is there also any possibility to define one-sided
dependencies/influences?
Take a look at mandatory vs. advisory constraints in the
Configuration Explained doc. A group is equivalent to a set of
order/collocation constraints with the infinite score (inf).
Post by Cnut Jansen
Post by Dejan Muhamedagic
Post by Cnut Jansen
And - concerning the failure-timeout - quite a while later, without
having resetted mysql's failure counter or having done anything else
4) alpha: FC(mysql)=3, crm_resource -F -r mysql -H alpha
Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_asyncmon_0 (call=59, rc=1, cib-update=592,
confirmed=false) unknown error
Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=60, rc=0, cib-update=596,
confirmed=true) ok
Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
operation mount_stop_0 (call=61, rc=0, cib-update=597,
confirmed=true) ok
beta: FC(mysql)=0
Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM
operation mount_start_0 (call=40, rc=0, cib-update=96,
confirmed=true) ok
Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM
operation mysql_start_0 (call=41, rc=0, cib-update=97,
confirmed=true) ok
Aug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=42, rc=0, cib-update=98,
confirmed=true) ok
Aug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM
operation mount_stop_0 (call=43, rc=0, cib-update=99,
confirmed=true) ok
alpha: FC(mysql)=4
Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRM
operation mount_start_0 (call=62, rc=0, cib-update=599,
confirmed=true) ok
Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRM
operation mysql_start_0 (call=63, rc=0, cib-update=600,
confirmed=true) ok
This worked as expected, i.e. after the 150s cluster-recheck
interval the resources were started at alpha.
Is it really "as exspected" that many(!) minutes - and even
cluster-rechecks - after the last picking-on and with a
failure-timeout of 45 seconds the failure counter is still not only
showing a count of 3, but also obviously really being 3 (not 0,
after being reset), thus now migrating resource allready on the
first following picking-on?!
Of course, that's not how it should work. If you observe such a
case, please file a bugzilla and attach hb_report. I just
commented what was shown above: 04:47:17 - 04:44:47 = 150.
Perhaps I missed something happening earlier?
Post by Cnut Jansen
Post by Dejan Muhamedagic
Post by Cnut Jansen
Post by Dejan Muhamedagic
BTW, what's the point of cloneMountMysql? If it can run only
colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
order orderMountMysql_drbd inf: msDrbdMysql:promote cloneMountMysql:start
It's a dual-primary-DRBD-configuration, so there are actually - when
everything is ok (-; - 2 masters of each DRBD-multistate-resource...
even though I admit that at least the dual primary respectively
master for msDrbdMysql is currently (quite) redundant, since in the
current cluster configuration there's only one, primitive
MySQL-resource and thus there'd be no inevitable need for MySQL's
data-dir being mounted all time on both nodes.
But since it's not harmful to have it mounted on the other node too,
and since msDrbdOpencms and msDrbdShared need to be mounted on both
nodes and since I put the complete installation and configuration of
the cluster into flexibly configurable shell-scripts, it's easier
respectively done with less typing to just put all DRBD- and
mount-resources' configuration into just one common loop. (-;
OK. It did cross my mind that it may be a dual-master drbd.
Your configuration is large. If you are going to run that in
producetion and don't really need a dual-master, then it'd be
good to get rid of the ocfs2 bits to make maintenance easier.
Well, there are 3 DRBD resources, and the other 2 DRBD resources
except the DRBD for MySQL's datadir must be dual-primary allready
now, since they're needed being mounted on all nodes for the
Apache/Tomcat/Opencms-teams. Therefor it's indeed easier for
maintenance to just keep all 3 DRBD's configurations in sync, and
only requiring one little line more for cloning mountMysql. (-;
Right.
Post by Cnut Jansen
Post by Dejan Muhamedagic
Post by Cnut Jansen
Post by Dejan Muhamedagic
Post by Cnut Jansen
d) I also have the impression that fail-counters don't get reset
after their failure-timeout, because when migration-threshold=3 is
set, upon every(!) following picking-on those issues occure, even
when I've waited for nearly 5 minutes (with failure-timeout=90)
without any touching the cluster
That seems to be a bug though I couldn't reproduce it with a
simple configuration.
I just also tested this once again: It seems like that
failure-timeout only sets back scores from -inf to around 0
(whereever they should normally be), allowing the resources to
return back to the node. I tested with setting a location constraint
for the underlying resource (see configuration): After the
failure-timeout has been completed, on the next cluster-recheck (and
only then!) the underlying resource and its dependants return to the
underlying resource's prefered location, as you see in logs above.
The count gets reset, but the cluster acts on it only after the
cluster-recheck-interval, unless something else makes the cluster
calculate new scores.
See above, picking-on #4: More than 26 minutes after the last
Hmm, sorry, couldn't see anything going on for 26 mins. I
probably didn't look carefully enough.
Post by Cnut Jansen
picking-on with settings of migration-threshold=3,
timeout-failure=40 and cluster-recheck-interval=150, resources get
allready migrated upon first picking-on (and shown failure-counter
raises to 4). To me that doesn't look like resetting failure-counter
to 0 after failure-timeout, but just resetting scores.
failure-timeout serves explicitely to reset the number of
failures not score.
Post by Cnut Jansen
Actually -
except maybe by tricks/force - it shouldn't be possible at all to
get the resource running again on the node it failed on for as long
as its failure counter there has still reached migration-threshold's
limit, right?
Right.
Post by Cnut Jansen
How can then failure counter ever reach counts beyond
migration-threshold's limit (ok, I could still imagine reasons for
that) at all,
It shouldn't. I see now above "alpha: FC(mysql)=4", I guess that
that shouldn't have happened.
Post by Cnut Jansen
and exspecially why does migration-threshold from then
on behave on every failure like it was set to 1, even when it's i.e.
set to 3?
Don't quite understand what do you mean by "behave". An attribute
cannot really behave. Well, obviously you ran into an unusual
behaviour, so it'd be best to make a hb_report for the incident
and open a bugzilla.

Thanks,

Dejan
Post by Cnut Jansen
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Claude.Durocher
2010-08-17 18:09:21 UTC
Permalink
I have a 3 node cluster running Xen resources on SLES11sp1 with HAE. The
nodes are connected to a SAN and Pacemaker controls the start of the shared
disk. From time to time, monitor of LVM volume groups or ocfs2 file system
fails : this triggers a stopping of the shared disk resource but this can't
be completed as Xen resources are running using the shared disk (I don't
know why monitor fails as the resource seems to be running fine) :

Log patterns:
Aug 13 21:27:49 qcpvms09 crmd: [9677]: ERROR: process_lrm_event: LRM
operation xen_configstore_volume1:1_monitor_120000 (32) Timed Out
(timeout=50000ms)
Aug 13 21:28:09 qcpvms09 crmd: [9677]: ERROR: process_lrm_event: LRM
operation xen_configstore_volume1:1_stop_0 (55) Timed Out (timeout=20000ms)
Aug 13 21:28:29 qcpvms09 crmd: [9677]: ERROR: process_lrm_event: LRM
operation qcdtypo01_monitor_120000 (54) Timed Out (timeout=90000ms)

Is there a way to have the monitor operation to retry x times before
declaring the resource failed? Or should the monitor part of the LVM
resource or OCFS2 resource be changed?

My running config :

node qcpvms07 \
attributes standby="off"
node qcpvms08 \
attributes standby="off"
node qcpvms09 \
attributes standby="off"
primitive clvm ocf:lvm2:clvmd \
operations $id="clvm-operations" \
op monitor interval="120" timeout="20" start-delay="10" \
op start interval="0" timeout="30" \
params daemon_timeout="30" daemon_options="-d0"
primitive dlm ocf:pacemaker:controld \
operations $id="dlm-operations" \
op monitor interval="120" timeout="20" start-delay="10"
primitive o2cb ocf:ocfs2:o2cb \
operations $id="o2cb-operations" \
op monitor interval="120" timeout="20" start-delay="10"
primitive ping-net1 ocf:pacemaker:ping \
operations $id="ping-net1-operations" \
op monitor interval="120" timeout="20" on-fail="restart"
start-delay="0" \
params name="ping-net1" host_list="192.168.88.1 192.168.88.43"
interval="15" timeout="5" attempts="5" \
meta target-role="started"
primitive qcddom01 ocf:heartbeat:Xen \
meta target-role="started" \
operations $id="qcddom01-operations" \
op monitor interval="120" timeout="30" on-fail="restart"
start-delay="60" \
op start interval="0" timeout="120" start-delay="0" \
op stop interval="0" timeout="120" \
op migrate_from interval="0" timeout="240" \
op migrate_to interval="0" timeout="240" \
params xmfile="/etc/xen/vm/qcddom01" allow-migrate="true"
primitive qcdtypo01 ocf:heartbeat:Xen \
meta target-role="started" \
operations $id="qcdtypo01-operations" \
op monitor interval="120" timeout="30" on-fail="restart"
start-delay="60" \
op start interval="0" timeout="120" start-delay="0" \
op stop interval="0" timeout="120" \
op migrate_from interval="0" timeout="240" \
op migrate_to interval="0" timeout="240" \
params xmfile="/etc/xen/vm/qcdtypo01" allow-migrate="true"
primitive stonith-sbd stonith:external/sbd \
meta target-role="started" \
operations $id="stonith-sbd-operations" \
op monitor interval="30" timeout="15" start-delay="30" \
params sbd_device="/dev/mapper/mpathc"
primitive xen_configstore_volume1 ocf:heartbeat:Filesystem \
operations $id="xen_configstore_volume1-operations" \
op monitor interval="120" timeout="40" start-delay="10" \
params device="/dev/xen_volume1_group/xen_configstore_volume1"
directory="/etc/xen/vm" fstype="ocfs2"
primitive xen_volume1_group ocf:heartbeat:LVM \
operations $id="xen_volume1_group-operations" \
op monitor interval="120" timeout="30" start-delay="10" \
params volgrpname="xen_volume1_group"
primitive xen_volume2_group ocf:heartbeat:LVM \
operations $id="xen_volume2_group-operations" \
op monitor interval="120" timeout="30" start-delay="10" \
params volgrpname="xen_volume2_group"
group shared-disk-group dlm clvm o2cb xen_volume1_group xen_volume2_group
xen_configstore_volume1 \
meta target-role="started"
clone ping-clone ping-net1 \
meta target-role="started" interleave="true" ordered="true"
clone shared-disk-clone shared-disk-group \
meta target-role="stopped"
location qcddom01-on-ping-net1 qcddom01 \
rule $id="qcddom01-on-ping-net1-rule" -inf: not_defined ping-net1 or
ping-net1 lte 0
location qcddom01-prefer-qcpvms08 qcddom01 500: qcpvms08
location qcdtypo01-on-ping-net1 qcdtypo01 \
rule $id="qcdtypo01-on-ping-net1-rule" -inf: not_defined ping-net1 or
ping-net1 lte 0
location qcdtypo01-prefer-qcpvms07 qcdtypo01 500: qcpvms07
colocation colocation-qcddom01-shared-disk-clone inf: qcddom01
shared-disk-clone
colocation colocation-qcdtypo01-shared-disk-clone inf: qcdtypo01
shared-disk-clone
order order-qcddom01 inf: shared-disk-clone qcddom01
order order-qcdtypo01 inf: shared-disk-clone qcdtypo01
property $id="cib-bootstrap-options" \
dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \
cluster-infrastructure="openais" \
no-quorum-policy="freeze" \
default-resource-stickiness="500" \
last-lrm-refresh="1281552641" \
expected-quorum-votes="3" \
stonith-timeout="240s"
op_defaults $id="op_defaults-options" \
record-pending="false"

Claude
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20100817/5bf63597/attachment.htm>
Andrew Beekhof
2010-08-26 07:31:38 UTC
Permalink
Post by Claude.Durocher
I have a 3 node cluster running Xen resources on SLES11sp1 with HAE. The
nodes are connected to a SAN and Pacemaker controls the start of the shared
disk. From time to time, monitor of LVM volume groups or ocfs2 file system
fails : this triggers a stopping of the shared disk resource but this can't
be completed as Xen resources are running using the shared disk (I don't
Aug 13 21:27:49 qcpvms09 crmd: [9677]: ERROR: process_lrm_event: LRM
operation xen_configstore_volume1:1_monitor_120000 (32) Timed Out
(timeout=50000ms)
Aug 13 21:28:09 qcpvms09 crmd: [9677]: ERROR: process_lrm_event: LRM
operation xen_configstore_volume1:1_stop_0 (55) Timed Out (timeout=20000ms)
Aug 13 21:28:29 qcpvms09 crmd: [9677]: ERROR: process_lrm_event: LRM
operation qcdtypo01_monitor_120000 (54) Timed Out (timeout=90000ms)
Is there a way to have the monitor operation to retry x times before
declaring the resource failed?
No
Post by Claude.Durocher
Or should the monitor part of the LVM
resource or OCFS2 resource be changed?
I'd start by increasing the timeouts.
If that doesn't work, you'll need to investigate the Filesystem agent
to see what is taking so long.
Post by Claude.Durocher
node qcpvms07 \
attributes standby="off"
node qcpvms08 \
attributes standby="off"
node qcpvms09 \
attributes standby="off"
primitive clvm ocf:lvm2:clvmd \
operations $id="clvm-operations" \
op monitor interval="120" timeout="20" start-delay="10" \
op start interval="0" timeout="30" \
params daemon_timeout="30" daemon_options="-d0"
primitive dlm ocf:pacemaker:controld \
operations $id="dlm-operations" \
op monitor interval="120" timeout="20" start-delay="10"
primitive o2cb ocf:ocfs2:o2cb \
operations $id="o2cb-operations" \
op monitor interval="120" timeout="20" start-delay="10"
primitive ping-net1 ocf:pacemaker:ping \
operations $id="ping-net1-operations" \
op monitor interval="120" timeout="20" on-fail="restart" start-delay="0" \
params name="ping-net1" host_list="192.168.88.1 192.168.88.43" interval="15"
timeout="5" attempts="5" \
meta target-role="started"
primitive qcddom01 ocf:heartbeat:Xen \
meta target-role="started" \
operations $id="qcddom01-operations" \
op monitor interval="120" timeout="30" on-fail="restart" start-delay="60" \
op start interval="0" timeout="120" start-delay="0" \
op stop interval="0" timeout="120" \
op migrate_from interval="0" timeout="240" \
op migrate_to interval="0" timeout="240" \
params xmfile="/etc/xen/vm/qcddom01" allow-migrate="true"
primitive qcdtypo01 ocf:heartbeat:Xen \
meta target-role="started" \
operations $id="qcdtypo01-operations" \
op monitor interval="120" timeout="30" on-fail="restart" start-delay="60" \
op start interval="0" timeout="120" start-delay="0" \
op stop interval="0" timeout="120" \
op migrate_from interval="0" timeout="240" \
op migrate_to interval="0" timeout="240" \
params xmfile="/etc/xen/vm/qcdtypo01" allow-migrate="true"
primitive stonith-sbd stonith:external/sbd \
meta target-role="started" \
operations $id="stonith-sbd-operations" \
op monitor interval="30" timeout="15" start-delay="30" \
params sbd_device="/dev/mapper/mpathc"
primitive xen_configstore_volume1 ocf:heartbeat:Filesystem \
operations $id="xen_configstore_volume1-operations" \
op monitor interval="120" timeout="40" start-delay="10" \
params device="/dev/xen_volume1_group/xen_configstore_volume1"
directory="/etc/xen/vm" fstype="ocfs2"
primitive xen_volume1_group ocf:heartbeat:LVM \
operations $id="xen_volume1_group-operations" \
op monitor interval="120" timeout="30" start-delay="10" \
params volgrpname="xen_volume1_group"
primitive xen_volume2_group ocf:heartbeat:LVM \
operations $id="xen_volume2_group-operations" \
op monitor interval="120" timeout="30" start-delay="10" \
params volgrpname="xen_volume2_group"
group shared-disk-group dlm clvm o2cb xen_volume1_group xen_volume2_group
xen_configstore_volume1 \
meta target-role="started"
clone ping-clone ping-net1 \
meta target-role="started" interleave="true" ordered="true"
clone shared-disk-clone shared-disk-group \
meta target-role="stopped"
location qcddom01-on-ping-net1 qcddom01 \
rule $id="qcddom01-on-ping-net1-rule" -inf: not_defined ping-net1 or
ping-net1 lte 0
location qcddom01-prefer-qcpvms08 qcddom01 500: qcpvms08
location qcdtypo01-on-ping-net1 qcdtypo01 \
rule $id="qcdtypo01-on-ping-net1-rule" -inf: not_defined ping-net1 or
ping-net1 lte 0
location qcdtypo01-prefer-qcpvms07 qcdtypo01 500: qcpvms07
colocation colocation-qcddom01-shared-disk-clone inf: qcddom01
shared-disk-clone
colocation colocation-qcdtypo01-shared-disk-clone inf: qcdtypo01
shared-disk-clone
order order-qcddom01 inf: shared-disk-clone qcddom01
order order-qcdtypo01 inf: shared-disk-clone qcdtypo01
property $id="cib-bootstrap-options" \
dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \
cluster-infrastructure="openais" \
no-quorum-policy="freeze" \
default-resource-stickiness="500" \
last-lrm-refresh="1281552641" \
expected-quorum-votes="3" \
stonith-timeout="240s"
op_defaults $id="op_defaults-options" \
record-pending="false"
Claude
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Cnut Jansen
2010-08-18 04:25:39 UTC
Permalink
Post by Dejan Muhamedagic
Post by Cnut Jansen
And if so: Is there also any possibility to define one-sided
dependencies/influences?
Take a look at mandatory vs. advisory constraints in the
Configuration Explained doc. A group is equivalent to a set of
order/collocation constraints with the infinite score (inf).
Yeah, just right before your latest reply I had just tested with changing scores for colocation constraints from inf to 0, and unless cluster gets picked-on too hard, it seems to be an acceptable workaround for now when migration-threshold should really be needed. But I guess we'll rather waive migration-threshold (maybe use/try other options for similiar effect, if needed) than possibly mess around with optional/advisory-scores.
You know, score of 0 leave situations possible where the dependent could be (tried to be) started/running elsewhere than the resource it elementarily depends on, since that would rather be just like a "Hey, Paci, I'd be really glad if you at least tried to colocate the dependent to its underlying resource please; but only if you feel like to!", than like the required "Listen up, Paci: I insist(!!!) on colocating the dependent with its underlying resource! At all costs! That's a strict order!". I.e. you only just need to move the underlying resource to another node (set a location constraint) and a score-0-colocation is allready history.
Post by Dejan Muhamedagic
Post by Cnut Jansen
Is it really "as exspected" that many(!) minutes - and even
cluster-rechecks - after the last picking-on and with a
failure-timeout of 45 seconds the failure counter is still not only
showing a count of 3, but also obviously really being 3 (not 0,
after being reset), thus now migrating resource allready on the
first following picking-on?!
Of course, that's not how it should work. If you observe such a
case, please file a bugzilla and attach hb_report. I just
commented what was shown above: 04:47:17 - 04:44:47 = 150.
Perhaps I missed something happening earlier?
Only the picking-on-until-migration-threshold's-limit and the pause of
26 mins. (-;

I filed a bug report and hope that it's not too poor, since it was my
first ever in this way and bed is calling for allready quite a long
while. (-#
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2468
Post by Dejan Muhamedagic
Post by Cnut Jansen
Post by Dejan Muhamedagic
The count gets reset, but the cluster acts on it only after the
cluster-recheck-interval, unless something else makes the cluster
calculate new scores.
See above, picking-on #4: More than 26 minutes after the last
Hmm, sorry, couldn't see anything going on for 26 mins. I
probably didn't look carefully enough.
Yeah, don't worry, you saw it right: You couldn't see anything going
during that time, because I indeed didn't do anything for those 26 mins;
not even touched the VMs at all! d-#
I did - respectively did not - so to make absolutely sure that there are
not even yet other timers for resetting the failure-counter or anything
else, and to thus prove that it obviously really doesn't get reset at
all. (Longest interval I so far heard of was the default
shutdown-escalation of 20min)
Cnut Jansen
2010-09-24 17:23:59 UTC
Permalink
Post by Cnut Jansen
Basically I have a cluster of 2 nodes with cloned DLM-, O2CB-, DRBD-,
mount-resources, and a MySQL-resource (grouped with an IPaddr-resource)
running on top of the other ones.
The MySQL(-group)-resource depends on the mount-resource, which depends
on both, the DRBD- and the O2CB-resources equally, and the O2CB-resource
depends on the DLM-resource.
cloneDlm -> cloneO2cb -\
}-> cloneMountMysql -> mysql / grpMysql( mysql
-> ipMysql )
msDrbdMysql -----------/
Furthermore for the MySQL(-group)-resource I set meta-attributes
"migration-threshold=1" and "failure-timeout=90" (later also tried
settings "3" and "130" for these).
a) the stops/restarts of the underlying resources happen only when
failcounter hits the limit set by migration-threshold; i.e. when set to
3, on first 2 failures only mysql/grpMysql is restarted on the same node
and only on 3rd one underlying resources are left in a mess (while
mysql/grpMysql migrates) (for DRBD reproducable; unsure about
DLM/O2CB-side, but there's sometimes hard trouble too after having
picked on mysql; just couldn't definitively link it yet)
b) upon causing mysql/grpMysql's migration, score for
msDrbdMysql:promote changes from 10020 to -inf and stays there for the
time of mysql/grpMysql's failure-timeout (proved with also setting to
130), before it rises back up to 10000
c) msDrbdMysql remains slave until the next cluster-recheck after its
promote-score went back up to 10000
d) I also have the impression that fail-counters don't get reset after
their failure-timeout, because when migration-threshold=3 is set, upon
every(!) following picking-on those issues occure, even when I've waited
for nearly 5 minutes (with failure-timeout=90) without any touching the
cluster
I experienced this on both test-clusters, a SLES 11 HAE SP1 with
Pacemaker 1.1.2, and a Debian Squeeze with Pacemaker 1.0.9. When
migration-threshold for mysql/grpMysql is removed, everything is fine
(except no migration of course). I can't remember such happening with
SLES 11 HAE SP0's Pacemaker 1.0.6.
p.s.: Just for fun / testing / proving I just also contrainted
grpLdirector to cloneMountShared... and could perfectly reproduce that
problem with its then underlying resources too.
For reference:
SLES11-HAE-SP1: Issues seem to be solved with latest officially released
packages (upgraded yesterday directly from Novell's repositories),
including Pacemaker version 1.1.2-0.6.1 (Arch: x86_64), shown
in crm_mon as "1.1.2-ecb1e2ea172ba2551f0bd763e557fccde68c849b". At
least so far I couldn't reproduce any unnecessary restart of underlying
resources (nor any other touching them at all), and fail-counters now -
after failure-timeout is over - get reset upon next cluster-recheck
(event- or interval-driven).
Debian Squeeze: Not tested again yet

Loading...