Discussion:
[Pacemaker] stonith
Thomas Manninger
2015-04-17 10:36:00 UTC
Permalink
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
D***@gmx.at
2015-04-18 05:39:16 UTC
Permalink
for information: i am using pacemaker 1.1.12 on debian wheezy

-----Ursprüngliche Nachricht-----
Gesendet: Freitag, 17 April 2015 um 12:36:00 Uhr
Von: "Thomas Manninger" <***@gmx.at>
An: ***@oss.clusterlabs.org
Betreff: [Pacemaker] stonith


Hi list,
 
i have a pacemaker/corosync2 setup with 4 nodes, stonith configured over ipmi interface.
 
My problem is, that sometimes, a wrong node is stonithed.
As example:
I have 4 servers: node1, node2, node3, node4
 
I start a hardware- reset on node node1, but node1 and node3 will be stonithed.
 
In the cluster.log, i found following entry:
Apr 17 11:02:41 [20473] node2   stonithd:    debug: stonith_action_create:       Initiating action reboot for agent fence_legacy (target=node1)
Apr 17 11:02:41 [20473] node2   stonithd:    debug: make_args:   Performing reboot action for node 'node1' as 'port=node1'
Apr 17 11:02:41 [20473] node2   stonithd:    debug: internal_stonith_action_execute:     forking
Apr 17 11:02:41 [20473] node2   stonithd:    debug: internal_stonith_action_execute:     sending args
Apr 17 11:02:41 [20473] node2   stonithd:    debug: stonith_device_execute:      Operation reboot for node node1 on p_stonith_node3 now running with pid=113092, timeout=60s
 
node1 will be reseted with the stonith primitive of node3 ?? Why??
 
my stonith config:
primitive p_stonith_node1 stonith:external/ipmi \
        params hostname=node1 ipaddr=10.100.0.2 passwd_method=file passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus priv=OPERATOR \
        op monitor interval=3s timeout=20s \
        meta target-role=Started failure-timeout=30s
primitive p_stonith_node2 stonith:external/ipmi \
        op monitor interval=3s timeout=20s \
        params hostname=node2 ipaddr=10.100.0.4 passwd_method=file passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus priv=OPERATOR \
        meta target-role=Started failure-timeout=30s
primitive p_stonith_node3 stonith:external/ipmi \
        op monitor interval=3s timeout=20s \
        params hostname=node3 ipaddr=10.100.0.6 passwd_method=file passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus priv=OPERATOR \
        meta target-role=Started failure-timeout=30s
primitive p_stonith_node4 stonith:external/ipmi \
        op monitor interval=3s timeout=20s \
        params hostname=node4 ipaddr=10.100.0.8 passwd_method=file passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus priv=OPERATOR \
        meta target-role=Started failure-timeout=30s
 
Somebody can help me??
Thanks!
 
Regards,
Thomas_______________________________________________ Pacemaker mailing list: ***@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.c
Andreas Kurz
2015-04-19 12:23:27 UTC
Permalink
Post by D***@gmx.at
Hi list,
i have a pacemaker/corosync2 setup with 4 nodes, stonith configured over
ipmi interface.
My problem is, that sometimes, a wrong node is stonithed.
I have 4 servers: node1, node2, node3, node4
I start a hardware- reset on node node1, but node1 and node3 will be
stonithed.
You have to tell pacemaker exactly what stonith-resource can fence what
node if the stonith agent you are using does not support the "list" action.

Do this by adding "pcmk_host_check=static-list" and "pcmk_host_list" to
every stonith-resource like:

primitive p_stonith_node3 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node3 ipaddr=10.100.0.6 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
pcmk_host_check="static-list" pcmk_host_list="node3"

... see "man stonithd".

Best regards,
Andreas
Post by D***@gmx.at
stonith_action_create: Initiating action reboot for agent
fence_legacy (target=node1)
Performing reboot action for node 'node1' as 'port=node1'
internal_stonith_action_execute: forking
internal_stonith_action_execute: sending args
stonith_device_execute: Operation reboot for node node1 on
p_stonith_node3 now running with pid=113092, timeout=60s
node1 will be reseted with the stonith primitive of node3 ?? Why??
primitive p_stonith_node1 stonith:external/ipmi \
params hostname=node1 ipaddr=10.100.0.2 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
op monitor interval=3s timeout=20s \
meta target-role=Started failure-timeout=30s
primitive p_stonith_node2 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node2 ipaddr=10.100.0.4 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
meta target-role=Started failure-timeout=30s
primitive p_stonith_node3 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node3 ipaddr=10.100.0.6 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
meta target-role=Started failure-timeout=30s
primitive p_stonith_node4 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node4 ipaddr=10.100.0.8 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
meta target-role=Started failure-timeout=30s
Somebody can help me??
Thanks!
Regards,
Thomas
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Andrei Borzenkov
2015-04-19 13:37:11 UTC
Permalink
В Sun, 19 Apr 2015 14:23:27 +0200
Post by Andreas Kurz
Post by D***@gmx.at
Hi list,
i have a pacemaker/corosync2 setup with 4 nodes, stonith configured over
ipmi interface.
My problem is, that sometimes, a wrong node is stonithed.
I have 4 servers: node1, node2, node3, node4
I start a hardware- reset on node node1, but node1 and node3 will be
stonithed.
You have to tell pacemaker exactly what stonith-resource can fence what
node if the stonith agent you are using does not support the "list" action.
pacmeker is expected to get this information dynamically from stonith
agent.
Post by Andreas Kurz
Do this by adding "pcmk_host_check=static-list" and "pcmk_host_list" to
Default for pcmk_host_check is "dynamic"; why it does not work in this
case? I use external/ipmi muself and I do not remember ever fiddling
with static list.
Post by Andreas Kurz
primitive p_stonith_node3 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node3 ipaddr=10.100.0.6 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
pcmk_host_check="static-list" pcmk_host_list="node3"
... see "man stonithd".
Best regards,
Andreas
Post by D***@gmx.at
stonith_action_create: Initiating action reboot for agent
fence_legacy (target=node1)
Performing reboot action for node 'node1' as 'port=node1'
internal_stonith_action_execute: forking
internal_stonith_action_execute: sending args
stonith_device_execute: Operation reboot for node node1 on
p_stonith_node3 now running with pid=113092, timeout=60s
node1 will be reseted with the stonith primitive of node3 ?? Why??
primitive p_stonith_node1 stonith:external/ipmi \
params hostname=node1 ipaddr=10.100.0.2 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
op monitor interval=3s timeout=20s \
meta target-role=Started failure-timeout=30s
primitive p_stonith_node2 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node2 ipaddr=10.100.0.4 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
meta target-role=Started failure-timeout=30s
primitive p_stonith_node3 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node3 ipaddr=10.100.0.6 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
meta target-role=Started failure-timeout=30s
primitive p_stonith_node4 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node4 ipaddr=10.100.0.8 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
meta target-role=Started failure-timeout=30s
Somebody can help me??
Thanks!
Regards,
Thomas
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Andrew Beekhof
2015-04-26 20:50:18 UTC
Permalink
В Sun, 19 Apr 2015 14:23:27 +0200
Post by Andreas Kurz
Post by D***@gmx.at
Hi list,
i have a pacemaker/corosync2 setup with 4 nodes, stonith configured over
ipmi interface.
My problem is, that sometimes, a wrong node is stonithed.
I have 4 servers: node1, node2, node3, node4
I start a hardware- reset on node node1, but node1 and node3 will be
stonithed.
You have to tell pacemaker exactly what stonith-resource can fence what
node if the stonith agent you are using does not support the "list" action.
pacmeker is expected to get this information dynamically from stonith
agent.
Only from those agents that support it.
Post by Andreas Kurz
Do this by adding "pcmk_host_check=static-list" and "pcmk_host_list" to
Default for pcmk_host_check is "dynamic"; why it does not work in this
case?
Because IPMI usually has no notion of host names?
I use external/ipmi muself and I do not remember ever fiddling
with static list.
Post by Andreas Kurz
primitive p_stonith_node3 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node3 ipaddr=10.100.0.6 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
pcmk_host_check="static-list" pcmk_host_list="node3"
... see "man stonithd".
Best regards,
Andreas
Post by D***@gmx.at
stonith_action_create: Initiating action reboot for agent
fence_legacy (target=node1)
Performing reboot action for node 'node1' as 'port=node1'
internal_stonith_action_execute: forking
internal_stonith_action_execute: sending args
stonith_device_execute: Operation reboot for node node1 on
p_stonith_node3 now running with pid=113092, timeout=60s
node1 will be reseted with the stonith primitive of node3 ?? Why??
primitive p_stonith_node1 stonith:external/ipmi \
params hostname=node1 ipaddr=10.100.0.2 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
op monitor interval=3s timeout=20s \
meta target-role=Started failure-timeout=30s
primitive p_stonith_node2 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node2 ipaddr=10.100.0.4 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
meta target-role=Started failure-timeout=30s
primitive p_stonith_node3 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node3 ipaddr=10.100.0.6 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
meta target-role=Started failure-timeout=30s
primitive p_stonith_node4 stonith:external/ipmi \
op monitor interval=3s timeout=20s \
params hostname=node4 ipaddr=10.100.0.8 passwd_method=file
passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus
priv=OPERATOR \
meta target-role=Started failure-timeout=30s
Somebody can help me??
Thanks!
Regards,
Thomas
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterla

Loading...