Alexandre
2015-05-04 08:41:08 UTC
Hi,
I have a pacemaker / corosync / cman cluster running on redhat 6.6.
Although cluster is working as expected, I have some trace of old failures
(several monthes ago) I can't gert rid of.
Basically I have set cluster-recheck-interval="300" and
failure-timeout="600" (in rsc_defaults) as shown bellow:
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="cman" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
last-lrm-refresh="1429702408" \
maintenance-mode="false" \
cluster-recheck-interval="300"
rsc_defaults $id="rsc-options" \
failure-timeout="600"
So I would expect old failure to be purged from the cib long ago, but
actually I have the following when issuing crm_mon -frA1.
Migration summary:
* Node host1:
etc_ml_drbd: migration-threshold=1000000 fail-count=244
last-failure='Sat Feb 14 17:04:05 2015'
spool_postfix_drbd_msg: migration-threshold=1000000 fail-count=244
last-failure='Sat Feb 14 17:04:05 2015'
lib_ml_drbd: migration-threshold=1000000 fail-count=244
last-failure='Sat Feb 14 17:04:05 2015'
lib_imap_drbd: migration-threshold=1000000 fail-count=244
last-failure='Sat Feb 14 17:04:05 2015'
spool_imap_drbd: migration-threshold=1000000 fail-count=11654
last-failure='Sat Feb 14 17:04:05 2015'
spool_ml_drbd: migration-threshold=1000000 fail-count=244
last-failure='Sat Feb 14 17:04:05 2015'
documents_drbd: migration-threshold=1000000 fail-count=248
last-failure='Sat Feb 14 17:58:55 2015'
* Node host2
documents_drbd: migration-threshold=1000000 fail-count=548
last-failure='Sat Feb 14 16:26:33 2015'
I have tried to crm_failcount -D the resources also tried cleanup... but
it's still there!
How can I get reid of those record (so my monitoring tools stop
complaining) .
Regards.
I have a pacemaker / corosync / cman cluster running on redhat 6.6.
Although cluster is working as expected, I have some trace of old failures
(several monthes ago) I can't gert rid of.
Basically I have set cluster-recheck-interval="300" and
failure-timeout="600" (in rsc_defaults) as shown bellow:
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="cman" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
last-lrm-refresh="1429702408" \
maintenance-mode="false" \
cluster-recheck-interval="300"
rsc_defaults $id="rsc-options" \
failure-timeout="600"
So I would expect old failure to be purged from the cib long ago, but
actually I have the following when issuing crm_mon -frA1.
Migration summary:
* Node host1:
etc_ml_drbd: migration-threshold=1000000 fail-count=244
last-failure='Sat Feb 14 17:04:05 2015'
spool_postfix_drbd_msg: migration-threshold=1000000 fail-count=244
last-failure='Sat Feb 14 17:04:05 2015'
lib_ml_drbd: migration-threshold=1000000 fail-count=244
last-failure='Sat Feb 14 17:04:05 2015'
lib_imap_drbd: migration-threshold=1000000 fail-count=244
last-failure='Sat Feb 14 17:04:05 2015'
spool_imap_drbd: migration-threshold=1000000 fail-count=11654
last-failure='Sat Feb 14 17:04:05 2015'
spool_ml_drbd: migration-threshold=1000000 fail-count=244
last-failure='Sat Feb 14 17:04:05 2015'
documents_drbd: migration-threshold=1000000 fail-count=248
last-failure='Sat Feb 14 17:58:55 2015'
* Node host2
documents_drbd: migration-threshold=1000000 fail-count=548
last-failure='Sat Feb 14 16:26:33 2015'
I have tried to crm_failcount -D the resources also tried cleanup... but
it's still there!
How can I get reid of those record (so my monitoring tools stop
complaining) .
Regards.