[Pacemaker] stonith

Discussion:

[Pacemaker] stonith

Thomas Manninger

2015-04-17 10:36:00 UTC

_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

D***@gmx.at

2015-04-18 05:39:16 UTC

Permalink

for information: i am using pacemaker 1.1.12 on debian wheezy

-----Ursprüngliche Nachricht-----
Gesendet: Freitag, 17 April 2015 um 12:36:00 Uhr
Von: "Thomas Manninger" <***@gmx.at>
An: ***@oss.clusterlabs.org
Betreff: [Pacemaker] stonith

Hi list,

i have a pacemaker/corosync2 setup with 4 nodes, stonith configured over ipmi interface.

My problem is, that sometimes, a wrong node is stonithed.
As example:
I have 4 servers: node1, node2, node3, node4

I start a hardware- reset on node node1, but node1 and node3 will be stonithed.

In the cluster.log, i found following entry:
Apr 17 11:02:41 [20473] node2   stonithd:    debug: stonith_action_create:       Initiating action reboot for agent fence_legacy (target=node1)
Apr 17 11:02:41 [20473] node2   stonithd:    debug: make_args:   Performing reboot action for node 'node1' as 'port=node1'
Apr 17 11:02:41 [20473] node2   stonithd:    debug: internal_stonith_action_execute:     forking
Apr 17 11:02:41 [20473] node2   stonithd:    debug: internal_stonith_action_execute:     sending args
Apr 17 11:02:41 [20473] node2   stonithd:    debug: stonith_device_execute:      Operation reboot for node node1 on p_stonith_node3 now running with pid=113092, timeout=60s

node1 will be reseted with the stonith primitive of node3 ?? Why??

my stonith config:
primitive p_stonith_node1 stonith:external/ipmi \
        params hostname=node1 ipaddr=10.100.0.2 passwd_method=file passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus priv=OPERATOR \
        op monitor interval=3s timeout=20s \
        meta target-role=Started failure-timeout=30s
primitive p_stonith_node2 stonith:external/ipmi \
        op monitor interval=3s timeout=20s \
        params hostname=node2 ipaddr=10.100.0.4 passwd_method=file passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus priv=OPERATOR \
        meta target-role=Started failure-timeout=30s
primitive p_stonith_node3 stonith:external/ipmi \
        op monitor interval=3s timeout=20s \
        params hostname=node3 ipaddr=10.100.0.6 passwd_method=file passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus priv=OPERATOR \
        meta target-role=Started failure-timeout=30s
primitive p_stonith_node4 stonith:external/ipmi \
        op monitor interval=3s timeout=20s \
        params hostname=node4 ipaddr=10.100.0.8 passwd_method=file passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus priv=OPERATOR \
        meta target-role=Started failure-timeout=30s

Somebody can help me??
Thanks!

Regards,
Thomas_______________________________________________ Pacemaker mailing list: ***@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.c

Andreas Kurz

2015-04-19 12:23:27 UTC

Permalink

Post by D***@gmx.at
Hi list,
i have a pacemaker/corosync2 setup with 4 nodes, stonith configured over
ipmi interface.
My problem is, that sometimes, a wrong node is stonithed.
I have 4 servers: node1, node2, node3, node4
I start a hardware- reset on node node1, but node1 and node3 will be
stonithed.

You have to tell pacemaker exactly what stonith-resource can fence what
node if the stonith agent you are using does not support the "list" action.

Do this by adding "pcmk_host_check=static-list" and "pcmk_host_list" to
every stonith-resource like:

primitive p_stonith_node3 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node3 ipaddr=10.100.0.6 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
pcmk_host_check="static-list" pcmk_host_list="node3"

... see "man stonithd".

Best regards,
Andreas

Post by D***@gmx.at
stonith_action_create: Initiating action reboot for agent
fence_legacy (target=node1)
Performing reboot action for node 'node1' as 'port=node1'
internal_stonith_action_execute: forking
internal_stonith_action_execute: sending args
stonith_device_execute: Operation reboot for node node1 on
p_stonith_node3 now running with pid=113092, timeout=60s
node1 will be reseted with the stonith primitive of node3 ?? Why??
primitive p_stonith_node1 stonith:external/ipmi \
params hostname=node1 ipaddr=10.100.0.2 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
op monitor interval=3s timeout=20s \
meta target-role=Started failure-timeout=30s
primitive p_stonith_node2 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node2 ipaddr=10.100.0.4 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
meta target-role=Started failure-timeout=30s
primitive p_stonith_node3 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node3 ipaddr=10.100.0.6 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
meta target-role=Started failure-timeout=30s
primitive p_stonith_node4 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node4 ipaddr=10.100.0.8 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
meta target-role=Started failure-timeout=30s
Somebody can help me??
Thanks!
Regards,
Thomas
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Andrei Borzenkov

2015-04-19 13:37:11 UTC

Permalink

Ð Sun, 19 Apr 2015 14:23:27 +0200

Post by Andreas Kurz

You have to tell pacemaker exactly what stonith-resource can fence what
node if the stonith agent you are using does not support the "list" action.

pacmeker is expected to get this information dynamically from stonith
agent.

Post by Andreas Kurz
Do this by adding "pcmk_host_check=static-list" and "pcmk_host_list" to

Default for pcmk_host_check is "dynamic"; why it does not work in this
case? I use external/ipmi muself and I do not remember ever fiddling
with static list.

Post by Andreas Kurz
primitive p_stonith_node3 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node3 ipaddr=10.100.0.6 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
pcmk_host_check="static-list" pcmk_host_list="node3"
... see "man stonithd".
Best regards,
Andreas

Andrew Beekhof

2015-04-26 20:50:18 UTC

Permalink

В Sun, 19 Apr 2015 14:23:27 +0200

Post by Andreas Kurz

You have to tell pacemaker exactly what stonith-resource can fence what
node if the stonith agent you are using does not support the "list" action.

pacmeker is expected to get this information dynamically from stonith
agent.

Only from those agents that support it.

Post by Andreas Kurz
Do this by adding "pcmk_host_check=static-list" and "pcmk_host_list" to

Default for pcmk_host_check is "dynamic"; why it does not work in this
case?

Because IPMI usually has no notion of host names?

I use external/ipmi muself and I do not remember ever fiddling
with static list.

_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org