Discussion:
[Pacemaker] Segfault on monitor resource
Oscar Salvador
2015-01-26 17:20:35 UTC
Permalink
Hi!

I'm writing here because two days ago I experienced a strange problem in my
Pacemaker Cluster.
Everything was working fine, till suddenly a Segfault in Nginx monitor
resource happened:

Jan 25 03:55:24 lb02 crmd: [9975]: notice: run_graph: ==== Transition 7551
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 03:55:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 25 04:00:08 lb02 cib: [9971]: info: cib_stats: Processed 1 operations
(0.00us average, 0% utilization) in the last 10min
Jan 25 04:10:24 lb02 crmd: [9975]: info: crm_timer_popped: PEngine Recheck
Timer (I_PE_CALC) just popped (900000ms)
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
origin=crm_timer_popped ]
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_state_transition: Progressed to
state S_POLICY_ENGINE after C_TIMER_POPPED
Jan 25 04:10:24 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Jan 25 04:10:24 lb02 pengine: [10028]: notice: common_apply_stickiness:
Ldirector-rsc can fail 999997 more times on lb02 before being forced off
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Jan 25 04:10:24 lb02 pengine: [10028]: notice: process_pe_message:
Transition 7552: PEngine Input stored in: /var/lib/pengine/pe-input-90.bz2
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_te_invoke: Processing graph
7552 (ref=pe_calc-dc-1422155424-7644) derived from
/var/lib/pengine/pe-input-90.bz2
Jan 25 04:10:24 lb02 crmd: [9975]: notice: run_graph: ==== Transition 7552
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]


Jan 25 04:10:30 lb02 lrmd: [9972]: info: RA output:
(Nginx-rsc:monitor:stderr) Segmentation fault ******* here it starts

As you can see, the last line.
And then:

Jan 25 04:10:30 lb02 lrmd: [9972]: info: RA output:
(Nginx-rsc:monitor:stderr) Killed
/usr/lib/ocf/resource.d//heartbeat/nginx: 910:
/usr/lib/ocf/resource.d//heartbeat/nginx: Cannot fork

I guess here Nginx was killed.

And then I have some others errors till Pacemaker decide to move the
resources to the node:

Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
Nginx-rsc_monitor_10000 (call=52, rc=2, cib-update=7633, confirmed=false)
invalid parameter
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_graph_event: Detected
action Nginx-rsc_monitor_10000 from a different transition: 5739 vs. 7552
Jan 25 04:10:30 lb02 crmd: [9975]: info: abort_transition_graph:
process_graph_event:476 - Triggered transition abort (complete=1,
tag=lrm_rsc_op, id=Nginx-rsc_last_failure_0,
magic=0:2;4:5739:0:42d1ed53-9686-4174-84e7-d2c230ed8832, cib=
3.14.40) : Old event
Jan 25 04:10:30 lb02 crmd: [9975]: WARN: update_failcount: Updating
failcount for Nginx-rsc on lb02 after failed monitor: rc=2 (update=value++,
time=1422155430)
Jan 25 04:10:30 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Jan 25 04:10:30 lb02 attrd: [9974]: info: log-rotate detected on logfile
/var/log/ha-log
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-Nginx-rsc (1)
Jan 25 04:10:30 lb02 pengine: [10028]: ERROR: unpack_rsc_op: Preventing
Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
parameter' (rc=2)
Jan 25 04:10:30 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Nginx-rsc_last_failure_0 on lb02: invalid parameter (2)
Jan 25 04:10:30 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: common_apply_stickiness:
Ldirector-rsc can fail 999997 more times on lb02 before being forced off
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_mysql (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_nginx (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_nginx6 (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_elasticsearch (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Move
Ldirector-rsc (Started lb02 -> lb01)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Move
Nginx-rsc (Started lb02 -> lb01)
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_perform_update: Sent
update 23: fail-count-Nginx-rsc=1
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-Nginx-rsc (1422155430)

I see that Pacemaker is complaining about some errors like "invalid
paraemter", for example in these lines:

Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
Nginx-rsc_monitor_10000 (call=52, rc=2, cib-update=7633, confirmed=false)
invalid parameter

Jan 25 04:10:30 lb02 pengine: [10028]: ERROR: unpack_rsc_op: Preventing
Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
parameter' (rc=2)

It sounds(for me) like a syntax problem defining the resources, but I've
checked the confic with crm_verify and there is no error:

root# (S) crm_verify -LVV
root# (S)

So I'm just wondering why pacemaker is complaining about an invalid
parameter.

This is my CIB objetcs:

node $id="43b2c5a1-9552-4438-962b-6e98a2dd67c7" lb01
node $id="68328520-68e0-42fd-9adf-062655691643" lb02
primitive IP-rsc_elasticsearch ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_elasticsearch6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive IP-rsc_mysql ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_mysql6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive IP-rsc_nginx ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_nginx6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive Ldirector-rsc ocf:heartbeat:ldirectord \
op monitor interval="10s" timeout="30s"
primitive Nginx-rsc ocf:heartbeat:nginx \
op monitor interval="10s" timeout="30s"
location cli-standby-IP-rsc_elasticsearch6 IP-rsc_elasticsearch6 \
rule $id="cli-standby-rule-IP-rsc_elasticsearch6" -inf: #uname eq lb01
location cli-standby-IP-rsc_mysql IP-rsc_mysql \
rule $id="cli-standby-rule-IP-rsc_mysql" -inf: #uname eq lb01
location cli-standby-IP-rsc_mysql6 IP-rsc_mysql6 \
rule $id="cli-standby-rule-IP-rsc_mysql6" -inf: #uname eq lb01
location cli-standby-IP-rsc_nginx IP-rsc_nginx \
rule $id="cli-standby-rule-IP-rsc_nginx" -inf: #uname eq lb01
location cli-standby-IP-rsc_nginx6 IP-rsc_nginx6 \
rule $id="cli-standby-rule-IP-rsc_nginx6" -inf: #uname eq lb01
colocation hcu_c inf: Nginx-rsc Ldirector-rsc IP-rsc_mysql IP-rsc_nginx
IP-rsc_nginx6 IP-rsc_elasticsearch
order hcu_o inf: IP-rsc_nginx IP-rsc_nginx6 IP-rsc_mysql Ldirector-rsc
Nginx-rsc IP-rsc_elasticsearch
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false


Do you have some hints that I can follow?

Thanks in advance!

Oscar
Oscar Salvador
2015-01-26 17:22:57 UTC
Permalink
Oh, I forgot some important details:

root# (S) crm status
============
Last updated: Mon Jan 26 18:21:35 2015
Last change: Sun Jan 25 05:19:13 2015 via crm_resource on lb01
Stack: Heartbeat
Current DC: lb01 (43b2c5a1-9552-4438-962b-6e98a2dd67c7) - partition with
quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, unknown expected votes
8 Resources configured.
============

Online: [ lb01 lb02 ]

IP-rsc_mysql (ocf::heartbeat:IPaddr2): Started lb02
IP-rsc_nginx (ocf::heartbeat:IPaddr2): Started lb02
IP-rsc_nginx6 (ocf::heartbeat:IPv6addr): Started lb02
IP-rsc_mysql6 (ocf::heartbeat:IPv6addr): Started lb02
IP-rsc_elasticsearch6 (ocf::heartbeat:IPv6addr): Started lb02
IP-rsc_elasticsearch (ocf::heartbeat:IPaddr2): Started lb02
Ldirector-rsc (ocf::heartbeat:ldirectord): Started lb02
Nginx-rsc (ocf::heartbeat:nginx): Started lb02


This is running on:

Debian 7.8
pacemaker 1.1.7-1
Post by Oscar Salvador
Hi!
I'm writing here because two days ago I experienced a strange problem in
my Pacemaker Cluster.
Everything was working fine, till suddenly a Segfault in Nginx monitor
Jan 25 03:55:24 lb02 crmd: [9975]: notice: run_graph: ==== Transition 7551
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 03:55:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 25 04:00:08 lb02 cib: [9971]: info: cib_stats: Processed 1 operations
(0.00us average, 0% utilization) in the last 10min
Jan 25 04:10:24 lb02 crmd: [9975]: info: crm_timer_popped: PEngine Recheck
Timer (I_PE_CALC) just popped (900000ms)
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
origin=crm_timer_popped ]
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_state_transition: Progressed
to state S_POLICY_ENGINE after C_TIMER_POPPED
Jan 25 04:10:24 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Ldirector-rsc can fail 999997 more times on lb02 before being forced off
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Transition 7552: PEngine Input stored in: /var/lib/pengine/pe-input-90.bz2
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_te_invoke: Processing graph
7552 (ref=pe_calc-dc-1422155424-7644) derived from
/var/lib/pengine/pe-input-90.bz2
Jan 25 04:10:24 lb02 crmd: [9975]: notice: run_graph: ==== Transition 7552
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
(Nginx-rsc:monitor:stderr) Segmentation fault ******* here it starts
As you can see, the last line.
(Nginx-rsc:monitor:stderr) Killed
/usr/lib/ocf/resource.d//heartbeat/nginx: Cannot fork
I guess here Nginx was killed.
And then I have some others errors till Pacemaker decide to move the
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
Nginx-rsc_monitor_10000 (call=52, rc=2, cib-update=7633, confirmed=false)
invalid parameter
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_graph_event: Detected
action Nginx-rsc_monitor_10000 from a different transition: 5739 vs. 7552
process_graph_event:476 - Triggered transition abort (complete=1,
tag=lrm_rsc_op, id=Nginx-rsc_last_failure_0,
magic=0:2;4:5739:0:42d1ed53-9686-4174-84e7-d2c230ed8832, cib=
3.14.40) : Old event
Jan 25 04:10:30 lb02 crmd: [9975]: WARN: update_failcount: Updating
failcount for Nginx-rsc on lb02 after failed monitor: rc=2 (update=value++,
time=1422155430)
Jan 25 04:10:30 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Jan 25 04:10:30 lb02 attrd: [9974]: info: log-rotate detected on logfile
/var/log/ha-log
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-Nginx-rsc (1)
Jan 25 04:10:30 lb02 pengine: [10028]: ERROR: unpack_rsc_op: Preventing
Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
parameter' (rc=2)
Jan 25 04:10:30 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Nginx-rsc_last_failure_0 on lb02: invalid parameter (2)
Jan 25 04:10:30 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Ldirector-rsc can fail 999997 more times on lb02 before being forced off
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_mysql (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_nginx (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_nginx6 (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_elasticsearch (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Move
Ldirector-rsc (Started lb02 -> lb01)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Move
Nginx-rsc (Started lb02 -> lb01)
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_perform_update: Sent
update 23: fail-count-Nginx-rsc=1
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-Nginx-rsc (1422155430)
I see that Pacemaker is complaining about some errors like "invalid
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
Nginx-rsc_monitor_10000 (call=52, rc=2, cib-update=7633, confirmed=false)
invalid parameter
Jan 25 04:10:30 lb02 pengine: [10028]: ERROR: unpack_rsc_op: Preventing
Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
parameter' (rc=2)
It sounds(for me) like a syntax problem defining the resources, but I've
root# (S) crm_verify -LVV
root# (S)
So I'm just wondering why pacemaker is complaining about an invalid
parameter.
node $id="43b2c5a1-9552-4438-962b-6e98a2dd67c7" lb01
node $id="68328520-68e0-42fd-9adf-062655691643" lb02
primitive IP-rsc_elasticsearch ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_elasticsearch6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive IP-rsc_mysql ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_mysql6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive IP-rsc_nginx ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_nginx6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive Ldirector-rsc ocf:heartbeat:ldirectord \
op monitor interval="10s" timeout="30s"
primitive Nginx-rsc ocf:heartbeat:nginx \
op monitor interval="10s" timeout="30s"
location cli-standby-IP-rsc_elasticsearch6 IP-rsc_elasticsearch6 \
rule $id="cli-standby-rule-IP-rsc_elasticsearch6" -inf: #uname eq lb01
location cli-standby-IP-rsc_mysql IP-rsc_mysql \
rule $id="cli-standby-rule-IP-rsc_mysql" -inf: #uname eq lb01
location cli-standby-IP-rsc_mysql6 IP-rsc_mysql6 \
rule $id="cli-standby-rule-IP-rsc_mysql6" -inf: #uname eq lb01
location cli-standby-IP-rsc_nginx IP-rsc_nginx \
rule $id="cli-standby-rule-IP-rsc_nginx" -inf: #uname eq lb01
location cli-standby-IP-rsc_nginx6 IP-rsc_nginx6 \
rule $id="cli-standby-rule-IP-rsc_nginx6" -inf: #uname eq lb01
colocation hcu_c inf: Nginx-rsc Ldirector-rsc IP-rsc_mysql IP-rsc_nginx
IP-rsc_nginx6 IP-rsc_elasticsearch
order hcu_o inf: IP-rsc_nginx IP-rsc_nginx6 IP-rsc_mysql Ldirector-rsc
Nginx-rsc IP-rsc_elasticsearch
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false
Do you have some hints that I can follow?
Thanks in advance!
Oscar
emmanuel segura
2015-01-27 09:10:12 UTC
Permalink
maybe you can use sar for checking if your server was tight of resources?

Jan 25 04:10:30 lb02 lrmd: [9972]: info: RA output:
(Nginx-rsc:monitor:stderr) Killed
/usr/lib/ocf/resource.d//heartbeat/nginx: 910:
/usr/lib/ocf/resource.d//heartbeat/nginx: Cannot fork
Post by Oscar Salvador
root# (S) crm status
============
Last updated: Mon Jan 26 18:21:35 2015
Last change: Sun Jan 25 05:19:13 2015 via crm_resource on lb01
Stack: Heartbeat
Current DC: lb01 (43b2c5a1-9552-4438-962b-6e98a2dd67c7) - partition with
quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, unknown expected votes
8 Resources configured.
============
Online: [ lb01 lb02 ]
IP-rsc_mysql (ocf::heartbeat:IPaddr2): Started lb02
IP-rsc_nginx (ocf::heartbeat:IPaddr2): Started lb02
IP-rsc_nginx6 (ocf::heartbeat:IPv6addr): Started lb02
IP-rsc_mysql6 (ocf::heartbeat:IPv6addr): Started lb02
IP-rsc_elasticsearch6 (ocf::heartbeat:IPv6addr): Started lb02
IP-rsc_elasticsearch (ocf::heartbeat:IPaddr2): Started lb02
Ldirector-rsc (ocf::heartbeat:ldirectord): Started lb02
Nginx-rsc (ocf::heartbeat:nginx): Started lb02
Debian 7.8
pacemaker 1.1.7-1
Post by Oscar Salvador
Hi!
I'm writing here because two days ago I experienced a strange problem in
my Pacemaker Cluster.
Everything was working fine, till suddenly a Segfault in Nginx monitor
Jan 25 03:55:24 lb02 crmd: [9975]: notice: run_graph: ==== Transition 7551
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 03:55:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 25 04:00:08 lb02 cib: [9971]: info: cib_stats: Processed 1 operations
(0.00us average, 0% utilization) in the last 10min
Jan 25 04:10:24 lb02 crmd: [9975]: info: crm_timer_popped: PEngine Recheck
Timer (I_PE_CALC) just popped (900000ms)
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
origin=crm_timer_popped ]
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_state_transition: Progressed
to state S_POLICY_ENGINE after C_TIMER_POPPED
Jan 25 04:10:24 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Ldirector-rsc can fail 999997 more times on lb02 before being forced off
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Transition 7552: PEngine Input stored in: /var/lib/pengine/pe-input-90.bz2
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_te_invoke: Processing graph
7552 (ref=pe_calc-dc-1422155424-7644) derived from
/var/lib/pengine/pe-input-90.bz2
Jan 25 04:10:24 lb02 crmd: [9975]: notice: run_graph: ==== Transition 7552
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
(Nginx-rsc:monitor:stderr) Segmentation fault ******* here it starts
As you can see, the last line.
(Nginx-rsc:monitor:stderr) Killed
/usr/lib/ocf/resource.d//heartbeat/nginx: Cannot fork
I guess here Nginx was killed.
And then I have some others errors till Pacemaker decide to move the
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
Nginx-rsc_monitor_10000 (call=52, rc=2, cib-update=7633, confirmed=false)
invalid parameter
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_graph_event: Detected
action Nginx-rsc_monitor_10000 from a different transition: 5739 vs. 7552
process_graph_event:476 - Triggered transition abort (complete=1,
tag=lrm_rsc_op, id=Nginx-rsc_last_failure_0,
magic=0:2;4:5739:0:42d1ed53-9686-4174-84e7-d2c230ed8832, cib=
3.14.40) : Old event
Jan 25 04:10:30 lb02 crmd: [9975]: WARN: update_failcount: Updating
failcount for Nginx-rsc on lb02 after failed monitor: rc=2 (update=value++,
time=1422155430)
Jan 25 04:10:30 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Jan 25 04:10:30 lb02 attrd: [9974]: info: log-rotate detected on logfile
/var/log/ha-log
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-Nginx-rsc (1)
Jan 25 04:10:30 lb02 pengine: [10028]: ERROR: unpack_rsc_op: Preventing
Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
parameter' (rc=2)
Jan 25 04:10:30 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Nginx-rsc_last_failure_0 on lb02: invalid parameter (2)
Jan 25 04:10:30 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Ldirector-rsc can fail 999997 more times on lb02 before being forced off
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_mysql (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_nginx (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_nginx6 (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_elasticsearch (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Move
Ldirector-rsc (Started lb02 -> lb01)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Move
Nginx-rsc (Started lb02 -> lb01)
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_perform_update: Sent
update 23: fail-count-Nginx-rsc=1
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-Nginx-rsc (1422155430)
I see that Pacemaker is complaining about some errors like "invalid
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
Nginx-rsc_monitor_10000 (call=52, rc=2, cib-update=7633, confirmed=false)
invalid parameter
Jan 25 04:10:30 lb02 pengine: [10028]: ERROR: unpack_rsc_op: Preventing
Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
parameter' (rc=2)
It sounds(for me) like a syntax problem defining the resources, but I've
root# (S) crm_verify -LVV
root# (S)
So I'm just wondering why pacemaker is complaining about an invalid
parameter.
node $id="43b2c5a1-9552-4438-962b-6e98a2dd67c7" lb01
node $id="68328520-68e0-42fd-9adf-062655691643" lb02
primitive IP-rsc_elasticsearch ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_elasticsearch6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive IP-rsc_mysql ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_mysql6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive IP-rsc_nginx ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_nginx6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive Ldirector-rsc ocf:heartbeat:ldirectord \
op monitor interval="10s" timeout="30s"
primitive Nginx-rsc ocf:heartbeat:nginx \
op monitor interval="10s" timeout="30s"
location cli-standby-IP-rsc_elasticsearch6 IP-rsc_elasticsearch6 \
rule $id="cli-standby-rule-IP-rsc_elasticsearch6" -inf: #uname eq lb01
location cli-standby-IP-rsc_mysql IP-rsc_mysql \
rule $id="cli-standby-rule-IP-rsc_mysql" -inf: #uname eq lb01
location cli-standby-IP-rsc_mysql6 IP-rsc_mysql6 \
rule $id="cli-standby-rule-IP-rsc_mysql6" -inf: #uname eq lb01
location cli-standby-IP-rsc_nginx IP-rsc_nginx \
rule $id="cli-standby-rule-IP-rsc_nginx" -inf: #uname eq lb01
location cli-standby-IP-rsc_nginx6 IP-rsc_nginx6 \
rule $id="cli-standby-rule-IP-rsc_nginx6" -inf: #uname eq lb01
colocation hcu_c inf: Nginx-rsc Ldirector-rsc IP-rsc_mysql IP-rsc_nginx
IP-rsc_nginx6 IP-rsc_elasticsearch
order hcu_o inf: IP-rsc_nginx IP-rsc_nginx6 IP-rsc_mysql Ldirector-rsc
Nginx-rsc IP-rsc_elasticsearch
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false
Do you have some hints that I can follow?
Thanks in advance!
Oscar
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
--
esta es mi vida e me la vivo hasta que dios quiera

_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Dejan Muhamedagic
2015-01-27 09:39:26 UTC
Permalink
Hi,
Post by Oscar Salvador
Hi!
I'm writing here because two days ago I experienced a strange problem in my
Pacemaker Cluster.
Everything was working fine, till suddenly a Segfault in Nginx monitor
Jan 25 03:55:24 lb02 crmd: [9975]: notice: run_graph: ==== Transition 7551
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 03:55:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 25 04:00:08 lb02 cib: [9971]: info: cib_stats: Processed 1 operations
(0.00us average, 0% utilization) in the last 10min
Jan 25 04:10:24 lb02 crmd: [9975]: info: crm_timer_popped: PEngine Recheck
Timer (I_PE_CALC) just popped (900000ms)
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
origin=crm_timer_popped ]
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_state_transition: Progressed to
state S_POLICY_ENGINE after C_TIMER_POPPED
Jan 25 04:10:24 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Ldirector-rsc can fail 999997 more times on lb02 before being forced off
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Transition 7552: PEngine Input stored in: /var/lib/pengine/pe-input-90.bz2
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_te_invoke: Processing graph
7552 (ref=pe_calc-dc-1422155424-7644) derived from
/var/lib/pengine/pe-input-90.bz2
Jan 25 04:10:24 lb02 crmd: [9975]: notice: run_graph: ==== Transition 7552
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
(Nginx-rsc:monitor:stderr) Segmentation fault ******* here it starts
What exactly did segfault? Do you have a core dump to examine?
Post by Oscar Salvador
As you can see, the last line.
(Nginx-rsc:monitor:stderr) Killed
/usr/lib/ocf/resource.d//heartbeat/nginx: Cannot fork
This could be related to the segfault, or due to other serious
system error.
Post by Oscar Salvador
I guess here Nginx was killed.
And then I have some others errors till Pacemaker decide to move the
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
Nginx-rsc_monitor_10000 (call=52, rc=2, cib-update=7633, confirmed=false)
invalid parameter
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_graph_event: Detected
action Nginx-rsc_monitor_10000 from a different transition: 5739 vs. 7552
process_graph_event:476 - Triggered transition abort (complete=1,
tag=lrm_rsc_op, id=Nginx-rsc_last_failure_0,
magic=0:2;4:5739:0:42d1ed53-9686-4174-84e7-d2c230ed8832, cib=
3.14.40) : Old event
Jan 25 04:10:30 lb02 crmd: [9975]: WARN: update_failcount: Updating
failcount for Nginx-rsc on lb02 after failed monitor: rc=2 (update=value++,
time=1422155430)
Jan 25 04:10:30 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Jan 25 04:10:30 lb02 attrd: [9974]: info: log-rotate detected on logfile
/var/log/ha-log
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-Nginx-rsc (1)
Jan 25 04:10:30 lb02 pengine: [10028]: ERROR: unpack_rsc_op: Preventing
Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
parameter' (rc=2)
Jan 25 04:10:30 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Nginx-rsc_last_failure_0 on lb02: invalid parameter (2)
Jan 25 04:10:30 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Ldirector-rsc can fail 999997 more times on lb02 before being forced off
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_mysql (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_nginx (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_nginx6 (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_elasticsearch (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Move
Ldirector-rsc (Started lb02 -> lb01)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Move
Nginx-rsc (Started lb02 -> lb01)
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_perform_update: Sent
update 23: fail-count-Nginx-rsc=1
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-Nginx-rsc (1422155430)
I see that Pacemaker is complaining about some errors like "invalid
That error code is what the nginx RA exited with. It's unusual,
but perhaps also due to the segfault.

Thanks,

Dejan
Post by Oscar Salvador
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
Nginx-rsc_monitor_10000 (call=52, rc=2, cib-update=7633, confirmed=false)
invalid parameter
Jan 25 04:10:30 lb02 pengine: [10028]: ERROR: unpack_rsc_op: Preventing
Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
parameter' (rc=2)
It sounds(for me) like a syntax problem defining the resources, but I've
root# (S) crm_verify -LVV
root# (S)
So I'm just wondering why pacemaker is complaining about an invalid
parameter.
node $id="43b2c5a1-9552-4438-962b-6e98a2dd67c7" lb01
node $id="68328520-68e0-42fd-9adf-062655691643" lb02
primitive IP-rsc_elasticsearch ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_elasticsearch6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive IP-rsc_mysql ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_mysql6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive IP-rsc_nginx ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_nginx6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive Ldirector-rsc ocf:heartbeat:ldirectord \
op monitor interval="10s" timeout="30s"
primitive Nginx-rsc ocf:heartbeat:nginx \
op monitor interval="10s" timeout="30s"
location cli-standby-IP-rsc_elasticsearch6 IP-rsc_elasticsearch6 \
rule $id="cli-standby-rule-IP-rsc_elasticsearch6" -inf: #uname eq lb01
location cli-standby-IP-rsc_mysql IP-rsc_mysql \
rule $id="cli-standby-rule-IP-rsc_mysql" -inf: #uname eq lb01
location cli-standby-IP-rsc_mysql6 IP-rsc_mysql6 \
rule $id="cli-standby-rule-IP-rsc_mysql6" -inf: #uname eq lb01
location cli-standby-IP-rsc_nginx IP-rsc_nginx \
rule $id="cli-standby-rule-IP-rsc_nginx" -inf: #uname eq lb01
location cli-standby-IP-rsc_nginx6 IP-rsc_nginx6 \
rule $id="cli-standby-rule-IP-rsc_nginx6" -inf: #uname eq lb01
colocation hcu_c inf: Nginx-rsc Ldirector-rsc IP-rsc_mysql IP-rsc_nginx
IP-rsc_nginx6 IP-rsc_elasticsearch
order hcu_o inf: IP-rsc_nginx IP-rsc_nginx6 IP-rsc_mysql Ldirector-rsc
Nginx-rsc IP-rsc_elasticsearch
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false
Do you have some hints that I can follow?
Thanks in advance!
Oscar
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Oscar Salvador
2015-01-27 14:18:13 UTC
Permalink
Hi,

I've checked the resource graphs I have, and the resources were fine, so I
think it's not a problem due to a high use of memory or something like that.
And unfortunately I don't have a core dump to analize(I'll enable it for a
future case) so the only thing I have are the logs.

For the line below, I though that was the process in charge to monitore
nginx what was killed due to a segfault:

RA output: (Nginx-rsc:monitor:stderr) Segmentation fault


I've checked the Nginx logs, and there is nothing worth there, actually
there is no activity, so I think it has to be something internal what
caused the failure.
I'll enable coredumps, it's the only thing I can do for now.

Thank you very much

Oscar
Post by Oscar Salvador
Hi,
Post by Oscar Salvador
Hi!
I'm writing here because two days ago I experienced a strange problem in
my
Post by Oscar Salvador
Pacemaker Cluster.
Everything was working fine, till suddenly a Segfault in Nginx monitor
Jan 25 03:55:24 lb02 crmd: [9975]: notice: run_graph: ==== Transition
7551
Post by Oscar Salvador
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 03:55:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 25 04:00:08 lb02 cib: [9971]: info: cib_stats: Processed 1 operations
(0.00us average, 0% utilization) in the last 10min
Jan 25 04:10:24 lb02 crmd: [9975]: info: crm_timer_popped: PEngine
Recheck
Post by Oscar Salvador
Timer (I_PE_CALC) just popped (900000ms)
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_TIMER_POPPED
Post by Oscar Salvador
origin=crm_timer_popped ]
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_state_transition: Progressed
to
Post by Oscar Salvador
state S_POLICY_ENGINE after C_TIMER_POPPED
Jan 25 04:10:24 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Ldirector-rsc can fail 999997 more times on lb02 before being forced off
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
/var/lib/pengine/pe-input-90.bz2
Post by Oscar Salvador
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_te_invoke: Processing graph
7552 (ref=pe_calc-dc-1422155424-7644) derived from
/var/lib/pengine/pe-input-90.bz2
Jan 25 04:10:24 lb02 crmd: [9975]: notice: run_graph: ==== Transition
7552
Post by Oscar Salvador
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
(Nginx-rsc:monitor:stderr) Segmentation fault ******* here it starts
What exactly did segfault? Do you have a core dump to examine?
Post by Oscar Salvador
As you can see, the last line.
(Nginx-rsc:monitor:stderr) Killed
/usr/lib/ocf/resource.d//heartbeat/nginx: Cannot fork
This could be related to the segfault, or due to other serious
system error.
Post by Oscar Salvador
I guess here Nginx was killed.
And then I have some others errors till Pacemaker decide to move the
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
Nginx-rsc_monitor_10000 (call=52, rc=2, cib-update=7633, confirmed=false)
invalid parameter
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_graph_event: Detected
action Nginx-rsc_monitor_10000 from a different transition: 5739 vs. 7552
process_graph_event:476 - Triggered transition abort (complete=1,
tag=lrm_rsc_op, id=Nginx-rsc_last_failure_0,
magic=0:2;4:5739:0:42d1ed53-9686-4174-84e7-d2c230ed8832, cib=
3.14.40) : Old event
Jan 25 04:10:30 lb02 crmd: [9975]: WARN: update_failcount: Updating
failcount for Nginx-rsc on lb02 after failed monitor: rc=2
(update=value++,
Post by Oscar Salvador
time=1422155430)
Jan 25 04:10:30 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL
Post by Oscar Salvador
origin=abort_transition_graph ]
Jan 25 04:10:30 lb02 attrd: [9974]: info: log-rotate detected on logfile
/var/log/ha-log
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-Nginx-rsc (1)
Jan 25 04:10:30 lb02 pengine: [10028]: ERROR: unpack_rsc_op: Preventing
Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
parameter' (rc=2)
Jan 25 04:10:30 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Nginx-rsc_last_failure_0 on lb02: invalid parameter (2)
Jan 25 04:10:30 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Ldirector-rsc can fail 999997 more times on lb02 before being forced off
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_mysql (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_nginx (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_nginx6 (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_elasticsearch (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Move
Ldirector-rsc (Started lb02 -> lb01)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Move
Nginx-rsc (Started lb02 -> lb01)
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_perform_update: Sent
update 23: fail-count-Nginx-rsc=1
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-Nginx-rsc (1422155430)
I see that Pacemaker is complaining about some errors like "invalid
That error code is what the nginx RA exited with. It's unusual,
but perhaps also due to the segfault.
Thanks,
Dejan
Post by Oscar Salvador
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
Nginx-rsc_monitor_10000 (call=52, rc=2, cib-update=7633, confirmed=false)
invalid parameter
Jan 25 04:10:30 lb02 pengine: [10028]: ERROR: unpack_rsc_op: Preventing
Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
parameter' (rc=2)
It sounds(for me) like a syntax problem defining the resources, but I've
root# (S) crm_verify -LVV
root# (S)
So I'm just wondering why pacemaker is complaining about an invalid
parameter.
node $id="43b2c5a1-9552-4438-962b-6e98a2dd67c7" lb01
node $id="68328520-68e0-42fd-9adf-062655691643" lb02
primitive IP-rsc_elasticsearch ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_elasticsearch6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive IP-rsc_mysql ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_mysql6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive IP-rsc_nginx ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_nginx6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive Ldirector-rsc ocf:heartbeat:ldirectord \
op monitor interval="10s" timeout="30s"
primitive Nginx-rsc ocf:heartbeat:nginx \
op monitor interval="10s" timeout="30s"
location cli-standby-IP-rsc_elasticsearch6 IP-rsc_elasticsearch6 \
rule $id="cli-standby-rule-IP-rsc_elasticsearch6" -inf: #uname eq lb01
location cli-standby-IP-rsc_mysql IP-rsc_mysql \
rule $id="cli-standby-rule-IP-rsc_mysql" -inf: #uname eq lb01
location cli-standby-IP-rsc_mysql6 IP-rsc_mysql6 \
rule $id="cli-standby-rule-IP-rsc_mysql6" -inf: #uname eq lb01
location cli-standby-IP-rsc_nginx IP-rsc_nginx \
rule $id="cli-standby-rule-IP-rsc_nginx" -inf: #uname eq lb01
location cli-standby-IP-rsc_nginx6 IP-rsc_nginx6 \
rule $id="cli-standby-rule-IP-rsc_nginx6" -inf: #uname eq lb01
colocation hcu_c inf: Nginx-rsc Ldirector-rsc IP-rsc_mysql IP-rsc_nginx
IP-rsc_nginx6 IP-rsc_elasticsearch
order hcu_o inf: IP-rsc_nginx IP-rsc_nginx6 IP-rsc_mysql Ldirector-rsc
Nginx-rsc IP-rsc_elasticsearch
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false
Do you have some hints that I can follow?
Thanks in advance!
Oscar
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Dejan Muhamedagic
2015-01-27 16:58:43 UTC
Permalink
Post by Oscar Salvador
Hi,
I've checked the resource graphs I have, and the resources were fine, so I
think it's not a problem due to a high use of memory or something like that.
And unfortunately I don't have a core dump to analize(I'll enable it for a
future case) so the only thing I have are the logs.
For the line below, I though that was the process in charge to monitore
RA output: (Nginx-rsc:monitor:stderr) Segmentation fault
This is just output captured during the execution of the RA
monitor action. It could've been anything within the RA (which is
just a shell script) to segfault.

Thanks,

Dejan
Post by Oscar Salvador
I've checked the Nginx logs, and there is nothing worth there, actually
there is no activity, so I think it has to be something internal what
caused the failure.
I'll enable coredumps, it's the only thing I can do for now.
Thank you very much
Oscar
Post by Oscar Salvador
Hi,
Post by Oscar Salvador
Hi!
I'm writing here because two days ago I experienced a strange problem in
my
Post by Oscar Salvador
Pacemaker Cluster.
Everything was working fine, till suddenly a Segfault in Nginx monitor
Jan 25 03:55:24 lb02 crmd: [9975]: notice: run_graph: ==== Transition
7551
Post by Oscar Salvador
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 03:55:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 25 04:00:08 lb02 cib: [9971]: info: cib_stats: Processed 1 operations
(0.00us average, 0% utilization) in the last 10min
Jan 25 04:10:24 lb02 crmd: [9975]: info: crm_timer_popped: PEngine
Recheck
Post by Oscar Salvador
Timer (I_PE_CALC) just popped (900000ms)
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_TIMER_POPPED
Post by Oscar Salvador
origin=crm_timer_popped ]
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_state_transition: Progressed
to
Post by Oscar Salvador
state S_POLICY_ENGINE after C_TIMER_POPPED
Jan 25 04:10:24 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Ldirector-rsc can fail 999997 more times on lb02 before being forced off
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
/var/lib/pengine/pe-input-90.bz2
Post by Oscar Salvador
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_te_invoke: Processing graph
7552 (ref=pe_calc-dc-1422155424-7644) derived from
/var/lib/pengine/pe-input-90.bz2
Jan 25 04:10:24 lb02 crmd: [9975]: notice: run_graph: ==== Transition
7552
Post by Oscar Salvador
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
(Nginx-rsc:monitor:stderr) Segmentation fault ******* here it starts
What exactly did segfault? Do you have a core dump to examine?
Post by Oscar Salvador
As you can see, the last line.
(Nginx-rsc:monitor:stderr) Killed
/usr/lib/ocf/resource.d//heartbeat/nginx: Cannot fork
This could be related to the segfault, or due to other serious
system error.
Post by Oscar Salvador
I guess here Nginx was killed.
And then I have some others errors till Pacemaker decide to move the
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
Nginx-rsc_monitor_10000 (call=52, rc=2, cib-update=7633, confirmed=false)
invalid parameter
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_graph_event: Detected
action Nginx-rsc_monitor_10000 from a different transition: 5739 vs. 7552
process_graph_event:476 - Triggered transition abort (complete=1,
tag=lrm_rsc_op, id=Nginx-rsc_last_failure_0,
magic=0:2;4:5739:0:42d1ed53-9686-4174-84e7-d2c230ed8832, cib=
3.14.40) : Old event
Jan 25 04:10:30 lb02 crmd: [9975]: WARN: update_failcount: Updating
failcount for Nginx-rsc on lb02 after failed monitor: rc=2
(update=value++,
Post by Oscar Salvador
time=1422155430)
Jan 25 04:10:30 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL
Post by Oscar Salvador
origin=abort_transition_graph ]
Jan 25 04:10:30 lb02 attrd: [9974]: info: log-rotate detected on logfile
/var/log/ha-log
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-Nginx-rsc (1)
Jan 25 04:10:30 lb02 pengine: [10028]: ERROR: unpack_rsc_op: Preventing
Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
parameter' (rc=2)
Jan 25 04:10:30 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Nginx-rsc_last_failure_0 on lb02: invalid parameter (2)
Jan 25 04:10:30 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Ldirector-rsc can fail 999997 more times on lb02 before being forced off
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_mysql (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_nginx (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_nginx6 (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_elasticsearch (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Move
Ldirector-rsc (Started lb02 -> lb01)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Move
Nginx-rsc (Started lb02 -> lb01)
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_perform_update: Sent
update 23: fail-count-Nginx-rsc=1
Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-Nginx-rsc (1422155430)
I see that Pacemaker is complaining about some errors like "invalid
That error code is what the nginx RA exited with. It's unusual,
but perhaps also due to the segfault.
Thanks,
Dejan
Post by Oscar Salvador
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
Nginx-rsc_monitor_10000 (call=52, rc=2, cib-update=7633, confirmed=false)
invalid parameter
Jan 25 04:10:30 lb02 pengine: [10028]: ERROR: unpack_rsc_op: Preventing
Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
parameter' (rc=2)
It sounds(for me) like a syntax problem defining the resources, but I've
root# (S) crm_verify -LVV
root# (S)
So I'm just wondering why pacemaker is complaining about an invalid
parameter.
node $id="43b2c5a1-9552-4438-962b-6e98a2dd67c7" lb01
node $id="68328520-68e0-42fd-9adf-062655691643" lb02
primitive IP-rsc_elasticsearch ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_elasticsearch6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive IP-rsc_mysql ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_mysql6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive IP-rsc_nginx ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_nginx6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive Ldirector-rsc ocf:heartbeat:ldirectord \
op monitor interval="10s" timeout="30s"
primitive Nginx-rsc ocf:heartbeat:nginx \
op monitor interval="10s" timeout="30s"
location cli-standby-IP-rsc_elasticsearch6 IP-rsc_elasticsearch6 \
rule $id="cli-standby-rule-IP-rsc_elasticsearch6" -inf: #uname eq lb01
location cli-standby-IP-rsc_mysql IP-rsc_mysql \
rule $id="cli-standby-rule-IP-rsc_mysql" -inf: #uname eq lb01
location cli-standby-IP-rsc_mysql6 IP-rsc_mysql6 \
rule $id="cli-standby-rule-IP-rsc_mysql6" -inf: #uname eq lb01
location cli-standby-IP-rsc_nginx IP-rsc_nginx \
rule $id="cli-standby-rule-IP-rsc_nginx" -inf: #uname eq lb01
location cli-standby-IP-rsc_nginx6 IP-rsc_nginx6 \
rule $id="cli-standby-rule-IP-rsc_nginx6" -inf: #uname eq lb01
colocation hcu_c inf: Nginx-rsc Ldirector-rsc IP-rsc_mysql IP-rsc_nginx
IP-rsc_nginx6 IP-rsc_elasticsearch
order hcu_o inf: IP-rsc_nginx IP-rsc_nginx6 IP-rsc_mysql Ldirector-rsc
Nginx-rsc IP-rsc_elasticsearch
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false
Do you have some hints that I can follow?
Thanks in advance!
Oscar
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Oscar Salvador
2015-01-27 17:12:34 UTC
Permalink
Post by Dejan Muhamedagic
Post by Oscar Salvador
Hi,
I've checked the resource graphs I have, and the resources were fine, so
I
Post by Oscar Salvador
think it's not a problem due to a high use of memory or something like
that.
Post by Oscar Salvador
And unfortunately I don't have a core dump to analize(I'll enable it for
a
Post by Oscar Salvador
future case) so the only thing I have are the logs.
For the line below, I though that was the process in charge to monitore
RA output: (Nginx-rsc:monitor:stderr) Segmentation fault
This is just output captured during the execution of the RA
monitor action. It could've been anything within the RA (which is
just a shell script) to segfault.
Hi,

Yes, I see.
I've enabled core dumps on the system, so the next time I'll be able to
check what is causing this.

Thank you very much
Oscar Salvador
Post by Dejan Muhamedagic
Thanks,
Dejan
Post by Oscar Salvador
I've checked the Nginx logs, and there is nothing worth there, actually
there is no activity, so I think it has to be something internal what
caused the failure.
I'll enable coredumps, it's the only thing I can do for now.
Thank you very much
Oscar
Post by Oscar Salvador
Hi,
Post by Oscar Salvador
Hi!
I'm writing here because two days ago I experienced a strange
problem in
Post by Oscar Salvador
Post by Oscar Salvador
my
Post by Oscar Salvador
Pacemaker Cluster.
Everything was working fine, till suddenly a Segfault in Nginx
monitor
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
Jan 25 03:55:24 lb02 crmd: [9975]: notice: run_graph: ==== Transition
7551
Post by Oscar Salvador
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 03:55:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 25 04:00:08 lb02 cib: [9971]: info: cib_stats: Processed 1
operations
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
(0.00us average, 0% utilization) in the last 10min
Jan 25 04:10:24 lb02 crmd: [9975]: info: crm_timer_popped: PEngine
Recheck
Post by Oscar Salvador
Timer (I_PE_CALC) just popped (900000ms)
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_TIMER_POPPED
Post by Oscar Salvador
origin=crm_timer_popped ]
Progressed
Post by Oscar Salvador
Post by Oscar Salvador
to
Post by Oscar Salvador
state S_POLICY_ENGINE after C_TIMER_POPPED
Processing
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Ldirector-rsc can fail 999997 more times on lb02 before being forced
off
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
input=I_PE_SUCCESS
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
cause=C_IPC_MESSAGE origin=handle_response ]
/var/lib/pengine/pe-input-90.bz2
Post by Oscar Salvador
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_te_invoke: Processing
graph
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
7552 (ref=pe_calc-dc-1422155424-7644) derived from
/var/lib/pengine/pe-input-90.bz2
Jan 25 04:10:24 lb02 crmd: [9975]: notice: run_graph: ==== Transition
7552
Post by Oscar Salvador
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
(Nginx-rsc:monitor:stderr) Segmentation fault ******* here it
starts
Post by Oscar Salvador
Post by Oscar Salvador
What exactly did segfault? Do you have a core dump to examine?
Post by Oscar Salvador
As you can see, the last line.
(Nginx-rsc:monitor:stderr) Killed
/usr/lib/ocf/resource.d//heartbeat/nginx: Cannot fork
This could be related to the segfault, or due to other serious
system error.
Post by Oscar Salvador
I guess here Nginx was killed.
And then I have some others errors till Pacemaker decide to move the
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM
operation
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
Nginx-rsc_monitor_10000 (call=52, rc=2, cib-update=7633,
confirmed=false)
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
invalid parameter
Detected
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
action Nginx-rsc_monitor_10000 from a different transition: 5739 vs.
7552
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
process_graph_event:476 - Triggered transition abort (complete=1,
tag=lrm_rsc_op, id=Nginx-rsc_last_failure_0,
magic=0:2;4:5739:0:42d1ed53-9686-4174-84e7-d2c230ed8832, cib=
3.14.40) : Old event
Jan 25 04:10:30 lb02 crmd: [9975]: WARN: update_failcount: Updating
failcount for Nginx-rsc on lb02 after failed monitor: rc=2
(update=value++,
Post by Oscar Salvador
time=1422155430)
Jan 25 04:10:30 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL
Post by Oscar Salvador
origin=abort_transition_graph ]
Jan 25 04:10:30 lb02 attrd: [9974]: info: log-rotate detected on
logfile
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
/var/log/ha-log
Sending
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
flush op to all hosts for: fail-count-Nginx-rsc (1)
Preventing
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
parameter' (rc=2)
Processing
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
failed op Nginx-rsc_last_failure_0 on lb02: invalid parameter (2)
Processing
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Ldirector-rsc can fail 999997 more times on lb02 before being forced
off
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_mysql (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_nginx (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_nginx6 (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
IP-rsc_elasticsearch (lb02)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Move
Ldirector-rsc (Started lb02 -> lb01)
Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Move
Nginx-rsc (Started lb02 -> lb01)
Sent
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
update 23: fail-count-Nginx-rsc=1
Sending
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
flush op to all hosts for: last-failure-Nginx-rsc (1422155430)
I see that Pacemaker is complaining about some errors like "invalid
That error code is what the nginx RA exited with. It's unusual,
but perhaps also due to the segfault.
Thanks,
Dejan
Post by Oscar Salvador
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM
operation
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
Nginx-rsc_monitor_10000 (call=52, rc=2, cib-update=7633,
confirmed=false)
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
invalid parameter
Preventing
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
parameter' (rc=2)
It sounds(for me) like a syntax problem defining the resources, but
I've
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
root# (S) crm_verify -LVV
root# (S)
So I'm just wondering why pacemaker is complaining about an invalid
parameter.
node $id="43b2c5a1-9552-4438-962b-6e98a2dd67c7" lb01
node $id="68328520-68e0-42fd-9adf-062655691643" lb02
primitive IP-rsc_elasticsearch ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_elasticsearch6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive IP-rsc_mysql ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_mysql6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive IP-rsc_nginx ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" nic="eth0" cidr_netmask="255.255.255.224"
primitive IP-rsc_nginx6 ocf:heartbeat:IPv6addr \
params ipv6addr="xxxxxxxxxxxxxx" \
op monitor interval="10s"
primitive Ldirector-rsc ocf:heartbeat:ldirectord \
op monitor interval="10s" timeout="30s"
primitive Nginx-rsc ocf:heartbeat:nginx \
op monitor interval="10s" timeout="30s"
location cli-standby-IP-rsc_elasticsearch6 IP-rsc_elasticsearch6 \
rule $id="cli-standby-rule-IP-rsc_elasticsearch6" -inf: #uname eq
lb01
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
location cli-standby-IP-rsc_mysql IP-rsc_mysql \
rule $id="cli-standby-rule-IP-rsc_mysql" -inf: #uname eq lb01
location cli-standby-IP-rsc_mysql6 IP-rsc_mysql6 \
rule $id="cli-standby-rule-IP-rsc_mysql6" -inf: #uname eq lb01
location cli-standby-IP-rsc_nginx IP-rsc_nginx \
rule $id="cli-standby-rule-IP-rsc_nginx" -inf: #uname eq lb01
location cli-standby-IP-rsc_nginx6 IP-rsc_nginx6 \
rule $id="cli-standby-rule-IP-rsc_nginx6" -inf: #uname eq lb01
colocation hcu_c inf: Nginx-rsc Ldirector-rsc IP-rsc_mysql
IP-rsc_nginx
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
IP-rsc_nginx6 IP-rsc_elasticsearch
order hcu_o inf: IP-rsc_nginx IP-rsc_nginx6 IP-rsc_mysql
Ldirector-rsc
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
Nginx-rsc IP-rsc_elasticsearch
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false
Do you have some hints that I can follow?
Thanks in advance!
Oscar
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Post by Oscar Salvador
Post by Oscar Salvador
Post by Oscar Salvador
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Post by Oscar Salvador
Post by Oscar Salvador
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Loading...