Jérôme Charaoui
2015-01-28 18:53:17 UTC
Hi,
I'm testing a 2-node Corosync (1.4.6) and Pacemaker (1.1.10+git20130802)
cluster on Debian 8.0 and having some problems with the stonith resources.
I've set up two external/ipmi resources on each node and wanted to test
how they would react by physically unplugging the IPMI device network
interfaces.
On the DC, no problem, the resource monitor fails, stop op succeeds and
due to location constraints, as expected the resource enters the stop
state and stays there. After replugging the network cable and cleaningup
the resource, it gets restored to normal state.
On the slave node, different scenario: after monitor op fails, stop op
also fails for an unknown reason. The cluster then retries the stop
operation unsuccessfully until I have the node enter/exit standby mode.
Replugging the network cable on the IPMI device has no effect.
At least, that's what I figure is happenning from these logs:
DC: http://pastebin.com/raw.php?i=QpwG6nea
Slave: http://pastebin.com/raw.php?i=3nesX8yJ
Config: http://pastebin.com/raw.php?i=3FrJuwWz
Any help tracking down the issue would be much appreciated.
Thanks!
--
Jérôme Charaoui
Technicien informatique
Collège de Maisonneuve
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
I'm testing a 2-node Corosync (1.4.6) and Pacemaker (1.1.10+git20130802)
cluster on Debian 8.0 and having some problems with the stonith resources.
I've set up two external/ipmi resources on each node and wanted to test
how they would react by physically unplugging the IPMI device network
interfaces.
On the DC, no problem, the resource monitor fails, stop op succeeds and
due to location constraints, as expected the resource enters the stop
state and stays there. After replugging the network cable and cleaningup
the resource, it gets restored to normal state.
On the slave node, different scenario: after monitor op fails, stop op
also fails for an unknown reason. The cluster then retries the stop
operation unsuccessfully until I have the node enter/exit standby mode.
Replugging the network cable on the IPMI device has no effect.
At least, that's what I figure is happenning from these logs:
DC: http://pastebin.com/raw.php?i=QpwG6nea
Slave: http://pastebin.com/raw.php?i=3nesX8yJ
Config: http://pastebin.com/raw.php?i=3FrJuwWz
Any help tracking down the issue would be much appreciated.
Thanks!
--
Jérôme Charaoui
Technicien informatique
Collège de Maisonneuve
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org