Discussion:
[Pacemaker] update cib after fence
p***@arz.at
2015-04-08 13:03:48 UTC
Permalink
hi,

how to cleanup cib from node after unexpected system halt?
failed node still thinks of running VirtualDomain resource, which is
already running on other node in cluster(sucessful takeover:

executing "pcs cluster start" -
....
Apr 8 13:41:10 lnx0083a daemon:info lnx0083a
VirtualDomain(lnx0106a)[20360]: INFO: Virtual domain lnx0106a currently
has no state, retrying.
Apr 8 13:41:12 lnx0083a daemon:err|error lnx0083a
VirtualDomain(lnx0106a)[20360]: ERROR: Virtual domain lnx0106a has no
state during stop operation, bailing out.
Apr 8 13:41:12 lnx0083a daemon:info lnx0083a
VirtualDomain(lnx0106a)[20360]: INFO: Issuing forced shutdown (destroy)
request for domain lnx0106a.
Apr 8 13:41:12 lnx0083a daemon:err|error lnx0083a
VirtualDomain(lnx0106a)[20360]: ERROR: forced stop failed
Apr 8 13:41:12 lnx0083a daemon:notice lnx0083a lrmd[14230]: notice:
operation_finished: lnx0106a_stop_0:20360:stderr [ error: failed to
connect to the hypervisor error: end of file while reading data: :
input/output error ]
Apr 8 13:41:12 lnx0083a daemon:notice lnx0083a lrmd[14230]: notice:
operation_finished: lnx0106a_stop_0:20360:stderr [ ocf-exit-reason:forced
stop failed ]
Apr 8 13:41:12 lnx0083a daemon:notice lnx0083a crmd[14233]: notice:
process_lrm_event: Operation lnx0106a_stop_0: unknown error
(node=lnx0083a, call=131, rc=1, cib-update=43, confirmed=true)
Apr 8 13:41:12 lnx0083a daemon:notice lnx0083a crmd[14233]: notice:
process_lrm_event: lnx0083a-lnx0106a_stop_0:131 [ error: failed to connect
to the hypervisor error: end of file while reading data: : input/output
error\nocf-exit-reason:forced stop failed\n ]
Apr 8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]:
warning: status_from_rc: Action 105 (lnx0106a_stop_0) on lnx0083a failed
(target: 0 vs. rc: 1): Error
Apr 8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]:
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a
after failed stop: rc=1 (update=INFINITY, time=1428493272)
Apr 8 13:41:12 lnx0083b daemon:notice lnx0083b crmd[18244]: notice:
abort_transition_graph: Transition aborted by lnx0106a_stop_0 'modify' on
lnx0083a: Event failed
(magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
cib=1.499.624, source=match_graph_event:350, 0)
Apr 8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]:
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a
after failed stop: rc=1 (update=INFINITY, time=1428493272)
Apr 8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]:
warning: status_from_rc: Action 105 (lnx0106a_stop_0) on lnx0083a failed
(target: 0 vs. rc: 1): Error
Apr 8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]:
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a
after failed stop: rc=1 (update=INFINITY, time=1428493272)
Apr 8 13:41:12 lnx0083b daemon:warn|warning lnx0083b crmd[18244]:
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a
after failed stop: rc=1 (update=INFINITY, time=1428493272)
Apr 8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]:
warning: unpack_rsc_op_failure: Processing failed op stop for lnx0106a on
lnx0083a: unknown error (1)
Apr 8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]:
warning: unpack_rsc_op_failure: Processing failed op stop for lnx0106a on
lnx0083a: unknown error (1)
Apr 8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]:
warning: pe_fence_node: Node lnx0083a will be fenced because of resource
failure(s)
Apr 8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]:
warning: common_apply_stickiness: Forcing lnx0106a away from lnx0083a
after 1000000 failures (max=3)
Apr 8 13:41:17 lnx0083b daemon:warn|warning lnx0083b pengine[18243]:
warning: stage6: Scheduling Node lnx0083a for STONITH
Apr 8 13:41:17 lnx0083b daemon:notice lnx0083b pengine[18243]: notice:
native_stop_constraints: Stop of failed resource lnx0106a is implicit
after lnx0083a is fenced
....

Node is fenced..

log from corosync.log:
...
Apr 08 13:41:00 [14226] lnx0083a pacemakerd: notice: mcp_read_config:
Configured corosync to accept connections from group 2035: OK (1)
Apr 08 13:41:00 [14226] lnx0083a pacemakerd: notice: main: Starting
Pacemaker 1.1.12 (Build: 4ed91da): agent-manpages ascii-docs ncurses
libqb-logging libqb-ip
c lha-fencing upstart nagios corosync-native atomic-attrd libesmtp acls
....
Apr 08 13:16:04 [23690] lnx0083a cib: info: cib_perform_op: +
/cib/status/node_state[@id='4']/lrm[@id='4']/lrm_resources/lrm_resource[@id='lnx0106a']/lrm_rsc_op[@id='lnx0106a_last_0']:
@operation_key=lnx0106a_stop_0, @operation=stop,
@transition-key=106:10167:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@transition-magic=0:0;106:10167:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@call-id=538, @last-run=1428491757, @last-rc-change=1428491757,
@exec-time=7686
Apr 08 13:41:04 [14231] lnx0083a attrd: info: write_attribute:
Sent update 40 with 3 changes for fail-count-vm-lnx0106a, id=<n/a>,
set=(null)
Apr 08 13:41:04 [14231] lnx0083a attrd: info: write_attribute:
Sent update 45 with 3 changes for fail-count-lnx0106a, id=<n/a>,
set=(null)
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_resource id="lnx0106a" type="VirtualDomain"
class="ocf" provider="heartbeat">
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op id="lnx0106a_last_0"
operation_key="lnx0106a_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.9"
transition-key="7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:7;7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0083b" call-id="660" rc-code="7" op-status="0" interval="0"
last-run="1427965815" last-rc-change="1427965815" exec-time="8
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_resource id="lnx0106a" type="VirtualDomain"
class="ocf" provider="heartbeat">
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op id="lnx0106a_last_failure_0"
operation_key="lnx0106a_migrate_to_0" operation="migrate_to"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="112:8364:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:1;112:8364:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129a" call-id="444" rc-code="1" op-status="0" interval="0"
last-run="1427973596" last-rc-change="1427
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op id="lnx0106a_last_0"
operation_key="lnx0106a_stop_0" operation="stop"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="113:9846:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:0;113:9846:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129a" call-id="546" rc-code="0" op-status="0" interval="0"
last-run="1428403880" last-rc-change="1428403880" exec-time="2
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op id="lnx0106a_monitor_30000"
operation_key="lnx0106a_monitor_30000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="47:8337:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:0;47:8337:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129a" call-id="436" rc-code="0" op-status="0"
interval="30000" last-rc-change="1427965985" exec-time="1312
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_resource id="lnx0106a" type="VirtualDomain"
class="ocf" provider="heartbeat">
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op id="lnx0106a_last_0"
operation_key="lnx0106a_start_0" operation="start"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="110:10168:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:0;110:10168:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129b" call-id="539" rc-code="0" op-status="0" interval="0"
last-run="1428491780" last-rc-change="1428491780" exec-tim
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op id="lnx0106a_monitor_30000"
operation_key="lnx0106a_monitor_30000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="89:10170:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:0;89:10170:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129b" call-id="540" rc-code="0" op-status="0"
interval="30000" last-rc-change="1428491810" exec-time="12
Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback:
Update 40 for fail-count-vm-lnx0106a: OK (0)
Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback:
Update 40 for fail-count-vm-lnx0106a[lnx0129a]=(null): OK (0)
Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback:
Update 40 for fail-count-vm-lnx0106a[lnx0129b]=(null): OK (0)
Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback:
Update 40 for fail-count-vm-lnx0106a[lnx0083b]=(null): OK (0)
Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback:
Update 45 for fail-count-lnx0106a: OK (0)
Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback:
Update 45 for fail-count-lnx0106a[lnx0129a]=(null): OK (0)
Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback:
Update 45 for fail-count-lnx0106a[lnx0129b]=(null): OK (0)
Apr 08 13:41:04 [14231] lnx0083a attrd: info: attrd_cib_callback:
Update 45 for fail-count-lnx0106a[lnx0083b]=(null): OK (0)
Apr 08 13:41:05 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_resource id="lnx0106a"
type="VirtualDomain" class="ocf" provider="heartbeat">
Apr 08 13:41:05 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op id="lnx0106a_last_0"
operation_key="lnx0106a_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.9"
transition-key="7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:7;7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0083b" call-id="660" rc-code="7" op-status="0" interval="0"
last-run="1427965815" last-rc-change="142796
Apr 08 13:41:07 [14230] lnx0083a lrmd: info:
process_lrmd_get_rsc_info: Resource 'lnx0106a' not found (27 active
resources)
Apr 08 13:41:07 [14230] lnx0083a lrmd: info:
process_lrmd_rsc_register: Added 'lnx0106a' to the rsc list (28
active resources)
Apr 08 13:41:07 [14233] lnx0083a crmd: info: do_lrm_rsc_op:
Performing key=65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc
op=lnx0106a_monitor_0
Apr 08 13:41:08 [14233] lnx0083a crmd: notice: process_lrm_event:
Operation lnx0106a_monitor_0: not running (node=lnx0083a, call=114, rc=7,
cib-update=34, confirmed=true)
Apr 08 13:41:08 [14228] lnx0083a cib: info: cib_perform_op: ++
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources: <lrm_resource
id="lnx0106a" type="VirtualDomain" class="ocf" provider="heartbeat"/>
Apr 08 13:41:08 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op
id="lnx0106a_last_failure_0" operation_key="lnx0106a_monitor_0"
operation="monitor" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.9"
transition-key="65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:7;65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0083a" call-id="114" rc-code="7" op-status="0" interval="0"
last-ru
Apr 08 13:41:08 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op
id="lnx0106a_last_0" operation_key="lnx0106a_monitor_0"
operation="monitor" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.9"
transition-key="65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:7;65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0083a" call-id="114" rc-code="7" op-status="0" interval="0"
last-run="14284
Apr 08 13:41:09 [14233] lnx0083a crmd: info: do_lrm_rsc_op:
Performing key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc
op=lnx0106a_stop_0
Apr 08 13:41:09 [14230] lnx0083a lrmd: info: log_execute:
executing - rsc:lnx0106a action:stop call_id:131
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:09 INFO: Virtual domain
lnx0106a currently has no state, retrying.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:10 INFO: Virtual domain
lnx0106a currently has no state, retrying.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 ERROR: Virtual domain
lnx0106a has no state during stop operation, bailing out.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 INFO: Issuing forced
shutdown (destroy) request for domain lnx0106a.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 ERROR: forced stop
failed
Apr 08 13:41:12 [14230] lnx0083a lrmd: notice: operation_finished:
lnx0106a_stop_0:20360:stderr [ error: failed to connect to the
hypervisor error: end of file while reading data: : input/output error ]
Apr 08 13:41:12 [14230] lnx0083a lrmd: notice: operation_finished:
lnx0106a_stop_0:20360:stderr [ ocf-exit-reason:forced stop failed
]
Apr 08 13:41:12 [14230] lnx0083a lrmd: info: log_finished:
finished - rsc:lnx0106a action:stop call_id:131 pid:20360 exit-code:1
exec-time:2609ms queue-time:0ms
Apr 08 13:41:12 [14233] lnx0083a crmd: notice: process_lrm_event:
Operation lnx0106a_stop_0: unknown error (node=lnx0083a, call=131, rc=1,
cib-update=43, confirmed=true)
Apr 08 13:41:12 [14233] lnx0083a crmd: notice: process_lrm_event:
lnx0083a-lnx0106a_stop_0:131 [ error: failed to connect to the hypervisor
error: end of file while reading data: : input/output
error\nocf-exit-reason:forced stop failed\n ]
Apr 08 13:41:12 [14228] lnx0083a cib: info: cib_perform_op: +
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='lnx0106a']/lrm_rsc_op[@id='lnx0106a_last_failure_0']:
@operation_key=lnx0106a_stop_0, @operation=stop,
@transition-key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@transition-magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@call-id=131, @rc-code=1, @last-run=1428493269,
@last-rc-change=1428493269, @exec-time=2609, @exit-reason=forced stop
Apr 08 13:41:12 [14228] lnx0083a cib: info: cib_perform_op: +
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='lnx0106a']/lrm_rsc_op[@id='lnx0106a_last_0']:
@operation_key=lnx0106a_stop_0, @operation=stop,
@transition-key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@transition-magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@call-id=131, @rc-code=1, @last-run=1428493269,
@last-rc-change=1428493269, @exec-time=2609, @exit-reason=forced stop
failed
Apr 08 13:41:12 [14231] lnx0083a attrd: info: attrd_peer_update:
Setting fail-count-lnx0106a[lnx0083a]: (null) -> INFINITY from lnx0083b
Apr 08 13:41:12 [14231] lnx0083a attrd: info: attrd_peer_update:
Setting last-failure-lnx0106a[lnx0083a]: (null) -> 1428493272 from
lnx0083b
Apr 08 13:41:12 [14228] lnx0083a cib: info: cib_perform_op: ++
/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']:
<nvpair id="status-1-fail-count-lnx0106a" name="fail-count-lnx0106a"
value="INFINITY"/>
Apr 08 13:41:12 [14228] lnx0083a cib: info: cib_perform_op: ++
/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']:
<nvpair id="status-1-last-failure-lnx0106a" name="last-failure-lnx0106a"
value="1428493272"/>
Apr 08 13:41:17 [14228] lnx0083a cib: info: cib_perform_op: +
/cib/status/node_state[@id='4']/lrm[@id='4']/lrm_resources/lrm_resource[@id='lnx0106a']/lrm_rsc_op[@id='lnx0106a_last_0']:
@operation_key=lnx0106a_stop_0, @operation=stop,
@transition-key=106:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@transition-magic=0:0;106:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@call-id=542, @last-run=1428493269, @last-rc-change=1428493269,
@exec-time=7645
...

any ideas?

thank you!
Philipp
Michael Schwartzkopff
2015-04-08 14:53:38 UTC
Permalink
Post by p***@arz.at
hi,
how to cleanup cib from node after unexpected system halt?
failed node still thinks of running VirtualDomain resource, which is
executing "pcs cluster start" -
....
Apr 8 13:41:10 lnx0083a daemon:info lnx0083a
VirtualDomain(lnx0106a)[20360]: INFO: Virtual domain lnx0106a currently
has no state, retrying.
Apr 8 13:41:12 lnx0083a daemon:err|error lnx0083a
VirtualDomain(lnx0106a)[20360]: ERROR: Virtual domain lnx0106a has no
state during stop operation, bailing out.
Apr 8 13:41:12 lnx0083a daemon:info lnx0083a
VirtualDomain(lnx0106a)[20360]: INFO: Issuing forced shutdown (destroy)
request for domain lnx0106a.
Apr 8 13:41:12 lnx0083a daemon:err|error lnx0083a
VirtualDomain(lnx0106a)[20360]: ERROR: forced stop failed
operation_finished: lnx0106a_stop_0:20360:stderr [ error: failed to
input/output error ]
operation_finished: lnx0106a_stop_0:20360:stderr [ ocf-exit-reason:forced
stop failed ]
process_lrm_event: Operation lnx0106a_stop_0: unknown error
(node=lnx0083a, call=131, rc=1, cib-update=43, confirmed=true)
process_lrm_event: lnx0083a-lnx0106a_stop_0:131 [ error: failed to connect
to the hypervisor error: end of file while reading data: : input/output
error\nocf-exit-reason:forced stop failed\n ]
warning: status_from_rc: Action 105 (lnx0106a_stop_0) on lnx0083a failed
(target: 0 vs. rc: 1): Error
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a
after failed stop: rc=1 (update=INFINITY, time=1428493272)
abort_transition_graph: Transition aborted by lnx0106a_stop_0 'modify' on
lnx0083a: Event failed
(magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
cib=1.499.624, source=match_graph_event:350, 0)
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a
after failed stop: rc=1 (update=INFINITY, time=1428493272)
warning: status_from_rc: Action 105 (lnx0106a_stop_0) on lnx0083a failed
(target: 0 vs. rc: 1): Error
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a
after failed stop: rc=1 (update=INFINITY, time=1428493272)
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a
after failed stop: rc=1 (update=INFINITY, time=1428493272)
warning: unpack_rsc_op_failure: Processing failed op stop for lnx0106a on
lnx0083a: unknown error (1)
warning: unpack_rsc_op_failure: Processing failed op stop for lnx0106a on
lnx0083a: unknown error (1)
warning: pe_fence_node: Node lnx0083a will be fenced because of resource
failure(s)
warning: common_apply_stickiness: Forcing lnx0106a away from lnx0083a
after 1000000 failures (max=3)
warning: stage6: Scheduling Node lnx0083a for STONITH
native_stop_constraints: Stop of failed resource lnx0106a is implicit
after lnx0083a is fenced
....
Node is fenced..
...
Configured corosync to accept connections from group 2035: OK (1)
Apr 08 13:41:00 [14226] lnx0083a pacemakerd: notice: main: Starting
Pacemaker 1.1.12 (Build: 4ed91da): agent-manpages ascii-docs ncurses
libqb-logging libqb-ip
c lha-fencing upstart nagios corosync-native atomic-attrd libesmtp acls
....
Apr 08 13:16:04 [23690] lnx0083a cib: info: cib_perform_op: +
@operation_key=lnx0106a_stop_0, @operation=stop,
@transition-key=106:10167:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@transition-magic=0:0;106:10167:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@call-id=538, @last-run=1428491757, @last-rc-change=1428491757,
@exec-time=7686
Sent update 40 with 3 changes for fail-count-vm-lnx0106a, id=<n/a>,
set=(null)
Sent update 45 with 3 changes for fail-count-lnx0106a, id=<n/a>,
set=(null)
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_resource id="lnx0106a" type="VirtualDomain"
class="ocf" provider="heartbeat">
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op id="lnx0106a_last_0"
operation_key="lnx0106a_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.9"
transition-key="7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:7;7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0083b" call-id="660" rc-code="7" op-status="0" interval="0"
last-run="1427965815" last-rc-change="1427965815" exec-time="8
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_resource id="lnx0106a" type="VirtualDomain"
class="ocf" provider="heartbeat">
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op id="lnx0106a_last_failure_0"
operation_key="lnx0106a_migrate_to_0" operation="migrate_to"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="112:8364:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:1;112:8364:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129a" call-id="444" rc-code="1" op-status="0" interval="0"
last-run="1427973596" last-rc-change="1427
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op id="lnx0106a_last_0"
operation_key="lnx0106a_stop_0" operation="stop"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="113:9846:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:0;113:9846:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129a" call-id="546" rc-code="0" op-status="0" interval="0"
last-run="1428403880" last-rc-change="1428403880" exec-time="2
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op id="lnx0106a_monitor_30000"
operation_key="lnx0106a_monitor_30000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="47:8337:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:0;47:8337:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129a" call-id="436" rc-code="0" op-status="0"
interval="30000" last-rc-change="1427965985" exec-time="1312
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_resource id="lnx0106a" type="VirtualDomain"
class="ocf" provider="heartbeat">
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op id="lnx0106a_last_0"
operation_key="lnx0106a_start_0" operation="start"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="110:10168:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:0;110:10168:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129b" call-id="539" rc-code="0" op-status="0" interval="0"
last-run="1428491780" last-rc-change="1428491780" exec-tim
Apr 08 13:41:04 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op id="lnx0106a_monitor_30000"
operation_key="lnx0106a_monitor_30000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="89:10170:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:0;89:10170:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129b" call-id="540" rc-code="0" op-status="0"
interval="30000" last-rc-change="1428491810" exec-time="12
Update 40 for fail-count-vm-lnx0106a: OK (0)
Update 40 for fail-count-vm-lnx0106a[lnx0129a]=(null): OK (0)
Update 40 for fail-count-vm-lnx0106a[lnx0129b]=(null): OK (0)
Update 40 for fail-count-vm-lnx0106a[lnx0083b]=(null): OK (0)
Update 45 for fail-count-lnx0106a: OK (0)
Update 45 for fail-count-lnx0106a[lnx0129a]=(null): OK (0)
Update 45 for fail-count-lnx0106a[lnx0129b]=(null): OK (0)
Update 45 for fail-count-lnx0106a[lnx0083b]=(null): OK (0)
Apr 08 13:41:05 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_resource id="lnx0106a"
type="VirtualDomain" class="ocf" provider="heartbeat">
Apr 08 13:41:05 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op id="lnx0106a_last_0"
operation_key="lnx0106a_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.9"
transition-key="7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:7;7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0083b" call-id="660" rc-code="7" op-status="0" interval="0"
last-run="1427965815" last-rc-change="142796
process_lrmd_get_rsc_info: Resource 'lnx0106a' not found (27 active
resources)
process_lrmd_rsc_register: Added 'lnx0106a' to the rsc list (28
active resources)
Performing key=65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc
op=lnx0106a_monitor_0
Operation lnx0106a_monitor_0: not running (node=lnx0083a, call=114, rc=7,
cib-update=34, confirmed=true)
Apr 08 13:41:08 [14228] lnx0083a cib: info: cib_perform_op: ++
id="lnx0106a" type="VirtualDomain" class="ocf" provider="heartbeat"/>
Apr 08 13:41:08 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op
id="lnx0106a_last_failure_0" operation_key="lnx0106a_monitor_0"
operation="monitor" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.9"
transition-key="65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:7;65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0083a" call-id="114" rc-code="7" op-status="0" interval="0"
last-ru
Apr 08 13:41:08 [14228] lnx0083a cib: info: cib_perform_op: ++
<lrm_rsc_op
id="lnx0106a_last_0" operation_key="lnx0106a_monitor_0"
operation="monitor" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.9"
transition-key="65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:7;65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0083a" call-id="114" rc-code="7" op-status="0" interval="0"
last-run="14284
Performing key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc
op=lnx0106a_stop_0
executing - rsc:lnx0106a action:stop call_id:131
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:09 INFO: Virtual domain
lnx0106a currently has no state, retrying.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:10 INFO: Virtual domain
lnx0106a currently has no state, retrying.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 ERROR: Virtual domain
lnx0106a has no state during stop operation, bailing out.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 INFO: Issuing forced
shutdown (destroy) request for domain lnx0106a.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 ERROR: forced stop
failed
lnx0106a_stop_0:20360:stderr [ error: failed to connect to the
hypervisor error: end of file while reading data: : input/output error ]
lnx0106a_stop_0:20360:stderr [ ocf-exit-reason:forced stop failed
]
finished - rsc:lnx0106a action:stop call_id:131 pid:20360 exit-code:1
exec-time:2609ms queue-time:0ms
Operation lnx0106a_stop_0: unknown error (node=lnx0083a, call=131, rc=1,
cib-update=43, confirmed=true)
lnx0083a-lnx0106a_stop_0:131 [ error: failed to connect to the hypervisor
error: end of file while reading data: : input/output
error\nocf-exit-reason:forced stop failed\n ]
Apr 08 13:41:12 [14228] lnx0083a cib: info: cib_perform_op: +
@operation_key=lnx0106a_stop_0, @operation=stop,
@transition-key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@transition-magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@call-id=131, @rc-code=1, @last-run=1428493269,
@last-rc-change=1428493269, @exec-time=2609, @exit-reason=forced stop
Apr 08 13:41:12 [14228] lnx0083a cib: info: cib_perform_op: +
@operation_key=lnx0106a_stop_0, @operation=stop,
@transition-key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@transition-magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@call-id=131, @rc-code=1, @last-run=1428493269,
@last-rc-change=1428493269, @exec-time=2609, @exit-reason=forced stop
failed
Setting fail-count-lnx0106a[lnx0083a]: (null) -> INFINITY from lnx0083b
Setting last-failure-lnx0106a[lnx0083a]: (null) -> 1428493272 from
lnx0083b
info: cib_perform_op: ++
name="fail-count-lnx0106a" value="INFINITY"/>
Apr 08 13:41:12 [14228] lnx0083a cib: info: cib_perform_op: ++
name="last-failure-lnx0106a" value="1428493272"/>
Apr 08 13:41:17 [14228] lnx0083a cib: info: cib_perform_op: +
@operation_key=lnx0106a_stop_0, @operation=stop,
@transition-key=106:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@transition-magic=0:0;106:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@call-id=542, @last-run=1428493269, @last-rc-change=1428493269,
@exec-time=7645
...
any ideas?
thank you!
Philipp
It will get the latest version of the CIB after it connects to the cluster
again.

The CIB has a timestamp and so every node can decide if it has the lastest
version of the CIB or it should fetch it from an other node.

Mit freundlichen Grüßen,

Michael Schwartzkopff
--
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein
p***@arz.at
2015-04-09 08:27:51 UTC
Permalink
Von: Michael Schwartzkopff <***@sys4.de>
An: The Pacemaker cluster resource manager
<***@oss.clusterlabs.org>
Datum: 08.04.2015 17:12
Betreff: Re: [Pacemaker] update cib after fence
Post by p***@arz.at
hi,
how to cleanup cib from node after unexpected system halt?
failed node still thinks of running VirtualDomain resource, which is
executing "pcs cluster start" -
....
Apr 8 13:41:10 lnx0083a daemon:info lnx0083a
VirtualDomain(lnx0106a)[20360]: INFO: Virtual domain lnx0106a currently
has no state, retrying.
Apr 8 13:41:12 lnx0083a daemon:err|error lnx0083a
VirtualDomain(lnx0106a)[20360]: ERROR: Virtual domain lnx0106a has no
state during stop operation, bailing out.
Apr 8 13:41:12 lnx0083a daemon:info lnx0083a
VirtualDomain(lnx0106a)[20360]: INFO: Issuing forced shutdown (destroy)
request for domain lnx0106a.
Apr 8 13:41:12 lnx0083a daemon:err|error lnx0083a
VirtualDomain(lnx0106a)[20360]: ERROR: forced stop failed
operation_finished: lnx0106a_stop_0:20360:stderr [ error: failed to
input/output error ]
operation_finished: lnx0106a_stop_0:20360:stderr [
ocf-exit-reason:forced
Post by p***@arz.at
stop failed ]
process_lrm_event: Operation lnx0106a_stop_0: unknown error
(node=lnx0083a, call=131, rc=1, cib-update=43, confirmed=true)
process_lrm_event: lnx0083a-lnx0106a_stop_0:131 [ error: failed to connect
to the hypervisor error: end of file while reading data: : input/output
error\nocf-exit-reason:forced stop failed\n ]
warning: status_from_rc: Action 105 (lnx0106a_stop_0) on lnx0083a failed
(target: 0 vs. rc: 1): Error
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a
after failed stop: rc=1 (update=INFINITY, time=1428493272)
abort_transition_graph: Transition aborted by lnx0106a_stop_0 'modify' on
lnx0083a: Event failed
(magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
cib=1.499.624, source=match_graph_event:350, 0)
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a
after failed stop: rc=1 (update=INFINITY, time=1428493272)
warning: status_from_rc: Action 105 (lnx0106a_stop_0) on lnx0083a failed
(target: 0 vs. rc: 1): Error
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a
after failed stop: rc=1 (update=INFINITY, time=1428493272)
warning: update_failcount: Updating failcount for lnx0106a on lnx0083a
after failed stop: rc=1 (update=INFINITY, time=1428493272)
warning: unpack_rsc_op_failure: Processing failed op stop for lnx0106a on
lnx0083a: unknown error (1)
warning: unpack_rsc_op_failure: Processing failed op stop for lnx0106a on
lnx0083a: unknown error (1)
warning: pe_fence_node: Node lnx0083a will be fenced because of resource
failure(s)
warning: common_apply_stickiness: Forcing lnx0106a away from lnx0083a
after 1000000 failures (max=3)
warning: stage6: Scheduling Node lnx0083a for STONITH
native_stop_constraints: Stop of failed resource lnx0106a is implicit
after lnx0083a is fenced
....
Node is fenced..
...
Configured corosync to accept connections from group 2035: OK (1)
Apr 08 13:41:00 [14226] lnx0083a pacemakerd: notice: main: Starting
Pacemaker 1.1.12 (Build: 4ed91da): agent-manpages ascii-docs ncurses
libqb-logging libqb-ip
c lha-fencing upstart nagios corosync-native atomic-attrd libesmtp acls
....
Apr 08 13:16:04 [23690] lnx0083a cib: info: cib_perform_op: +
@operation_key=lnx0106a_stop_0, @operation=stop,
@transition-key=106:10167:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@transition-magic=0:0;106:10167:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@call-id=538, @last-run=1428491757, @last-rc-change=1428491757,
@exec-time=7686
Sent update 40 with 3 changes for fail-count-vm-lnx0106a, id=<n/a>,
set=(null)
Sent update 45 with 3 changes for fail-count-lnx0106a, id=<n/a>,
set=(null)
++
Post by p***@arz.at
<lrm_resource id="lnx0106a" type="VirtualDomain"
class="ocf" provider="heartbeat">
++
Post by p***@arz.at
<lrm_rsc_op id="lnx0106a_last_0"
operation_key="lnx0106a_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.9"
transition-key="7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:7;7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0083b" call-id="660" rc-code="7" op-status="0" interval="0"
last-run="1427965815" last-rc-change="1427965815" exec-time="8
++
Post by p***@arz.at
<lrm_resource id="lnx0106a" type="VirtualDomain"
class="ocf" provider="heartbeat">
++
Post by p***@arz.at
<lrm_rsc_op id="lnx0106a_last_failure_0"
operation_key="lnx0106a_migrate_to_0" operation="migrate_to"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="112:8364:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:1;112:8364:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129a" call-id="444" rc-code="1" op-status="0" interval="0"
last-run="1427973596" last-rc-change="1427
++
Post by p***@arz.at
<lrm_rsc_op id="lnx0106a_last_0"
operation_key="lnx0106a_stop_0" operation="stop"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="113:9846:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:0;113:9846:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129a" call-id="546" rc-code="0" op-status="0" interval="0"
last-run="1428403880" last-rc-change="1428403880" exec-time="2
++
Post by p***@arz.at
<lrm_rsc_op id="lnx0106a_monitor_30000"
operation_key="lnx0106a_monitor_30000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="47:8337:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:0;47:8337:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129a" call-id="436" rc-code="0" op-status="0"
interval="30000" last-rc-change="1427965985" exec-time="1312
++
Post by p***@arz.at
<lrm_resource id="lnx0106a" type="VirtualDomain"
class="ocf" provider="heartbeat">
++
Post by p***@arz.at
<lrm_rsc_op id="lnx0106a_last_0"
operation_key="lnx0106a_start_0" operation="start"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="110:10168:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:0;110:10168:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129b" call-id="539" rc-code="0" op-status="0" interval="0"
last-run="1428491780" last-rc-change="1428491780" exec-tim
++
Post by p***@arz.at
<lrm_rsc_op id="lnx0106a_monitor_30000"
operation_key="lnx0106a_monitor_30000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="89:10170:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:0;89:10170:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129b" call-id="540" rc-code="0" op-status="0"
interval="30000" last-rc-change="1428491810" exec-time="12
Update 40 for fail-count-vm-lnx0106a: OK (0)
Update 40 for fail-count-vm-lnx0106a[lnx0129a]=(null): OK (0)
Update 40 for fail-count-vm-lnx0106a[lnx0129b]=(null): OK (0)
Update 40 for fail-count-vm-lnx0106a[lnx0083b]=(null): OK (0)
Update 45 for fail-count-lnx0106a: OK (0)
Update 45 for fail-count-lnx0106a[lnx0129a]=(null): OK (0)
Update 45 for fail-count-lnx0106a[lnx0129b]=(null): OK (0)
Update 45 for fail-count-lnx0106a[lnx0083b]=(null): OK (0)
++
Post by p***@arz.at
<lrm_resource id="lnx0106a"
type="VirtualDomain" class="ocf" provider="heartbeat">
++
Post by p***@arz.at
<lrm_rsc_op id="lnx0106a_last_0"
operation_key="lnx0106a_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.9"
transition-key="7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:7;7:8297:7:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0083b" call-id="660" rc-code="7" op-status="0" interval="0"
last-run="1427965815" last-rc-change="142796
process_lrmd_get_rsc_info: Resource 'lnx0106a' not found (27 active
resources)
process_lrmd_rsc_register: Added 'lnx0106a' to the rsc list (28
active resources)
Performing key=65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc
op=lnx0106a_monitor_0
Operation lnx0106a_monitor_0: not running (node=lnx0083a, call=114, rc=7,
cib-update=34, confirmed=true)
++
<lrm_resource
Post by p***@arz.at
id="lnx0106a" type="VirtualDomain" class="ocf" provider="heartbeat"/>
++
Post by p***@arz.at
<lrm_rsc_op
id="lnx0106a_last_failure_0" operation_key="lnx0106a_monitor_0"
operation="monitor" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.9"
transition-key="65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:7;65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0083a" call-id="114" rc-code="7" op-status="0" interval="0"
last-ru
++
Post by p***@arz.at
<lrm_rsc_op
id="lnx0106a_last_0" operation_key="lnx0106a_monitor_0"
operation="monitor" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.9"
transition-key="65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:7;65:10177:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0083a" call-id="114" rc-code="7" op-status="0" interval="0"
last-run="14284
Performing key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc
op=lnx0106a_stop_0
executing - rsc:lnx0106a action:stop call_id:131
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:09 INFO: Virtual domain
lnx0106a currently has no state, retrying.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:10 INFO: Virtual domain
lnx0106a currently has no state, retrying.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 ERROR: Virtual domain
lnx0106a has no state during stop operation, bailing out.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 INFO: Issuing forced
shutdown (destroy) request for domain lnx0106a.
VirtualDomain(lnx0106a)[20360]: 2015/04/08_13:41:12 ERROR: forced stop
failed
lnx0106a_stop_0:20360:stderr [ error: failed to connect to the
hypervisor error: end of file while reading data: : input/output error ]
lnx0106a_stop_0:20360:stderr [ ocf-exit-reason:forced stop failed
]
finished - rsc:lnx0106a action:stop call_id:131 pid:20360 exit-code:1
exec-time:2609ms queue-time:0ms
Operation lnx0106a_stop_0: unknown error (node=lnx0083a, call=131, rc=1,
cib-update=43, confirmed=true)
lnx0083a-lnx0106a_stop_0:131 [ error: failed to connect to the
hypervisor
Post by p***@arz.at
error: end of file while reading data: : input/output
error\nocf-exit-reason:forced stop failed\n ]
Apr 08 13:41:12 [14228] lnx0083a cib: info: cib_perform_op: +
@operation_key=lnx0106a_stop_0, @operation=stop,
@transition-key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@transition-magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@call-id=131, @rc-code=1, @last-run=1428493269,
@last-rc-change=1428493269, @exec-time=2609, @exit-reason=forced stop
Apr 08 13:41:12 [14228] lnx0083a cib: info: cib_perform_op: +
@operation_key=lnx0106a_stop_0, @operation=stop,
@transition-key=105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@transition-magic=0:1;105:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@call-id=131, @rc-code=1, @last-run=1428493269,
@last-rc-change=1428493269, @exec-time=2609, @exit-reason=forced stop
failed
Setting fail-count-lnx0106a[lnx0083a]: (null) -> INFINITY from lnx0083b
Setting last-failure-lnx0106a[lnx0083a]: (null) -> 1428493272 from
lnx0083b
info: cib_perform_op: ++
name="fail-count-lnx0106a" value="INFINITY"/>
++
Post by p***@arz.at
name="last-failure-lnx0106a" value="1428493272"/>
Apr 08 13:41:17 [14228] lnx0083a cib: info: cib_perform_op: +
@operation_key=lnx0106a_stop_0, @operation=stop,
@transition-key=106:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@transition-magic=0:0;106:10179:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc,
@call-id=542, @last-run=1428493269, @last-rc-change=1428493269,
@exec-time=7645
...
any ideas?
thank you!
Philipp
It will get the latest version of the CIB after it connects to the cluster
again.>
The CIB has a timestamp and so every node can decide if it has the lastest
version of the CIB or it should fetch it from an other node.
rechecked config and logs.
during cluster startup cib update was successful, looks like there is/was
an outstanding livemigration request on mentioned VM (lnx0106a) to joining
KVM host. although lnx0106a was up/running on lnx0129b:

<lrm_rsc_op id="lnx0106a_last_0"
operation_key="lnx0106a_start_0" operation="start"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="110:1018
0:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:0;110:10180:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129b" call-id="543" rc-code="0" op-status="0" interval=
"0" last-run="1428493293" last-rc-change="1428493293" exec-time="2639"
queue-time="0" op-digest="b963c7f1b26e14b411d990a5c51d86c0"/>
<lrm_rsc_op id="lnx0106a_monitor_30000"
operation_key="lnx0106a_monitor_30000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transitio
n-key="89:10182:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:0;89:10182:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129b" call-id="544" rc-code="0" op-status=
"0" interval="30000" last-rc-change="1428493323" exec-time="1123"
queue-time="0" op-digest="048ac70015296e4006c8825a9fc7b5f1"/>

why does pacemaker try to move VM to joining node?
...
<lrm_rsc_op id="lnx0106a_last_failure_0"
operation_key="lnx0106a_migrate_to_0" operation="migrate_to"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="112:8364:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
transition-magic="0:1;112:8364:0:f57c21e4-fd47-4fef-9d73-c7d8b204c9bc"
on_node="lnx0129a" call-id="444" rc-code="1" op-status="0" interval="0"
last-run="1427973596" last-rc-change="1427973596" exec-time="503"
queue-time="0" op-digest="b963c7f1b26e14b411d990a5c51d86c0"
exit-reason="lnx0106a: live migration to lnx0083a failed: 1"
migrate_source="lnx0129a" migrate_target="lnx0083a"/>
....
<rsc_location id="cli-ban-lnx0106a-on-lnx0129a" node="lnx0129a"
role="Started" rsc="lnx0106a" score="-INFINITY"/>
...

after cleanup of VM Resource lnx0106a the host lnx0083a starts successful
thank you!
Post by p***@arz.at
Mit freundlichen Grüßen,
Michael Schwartzkopff
--
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein[Anhang "signature.asc"
gelöscht von Philipp Achmüller/ARZ/AT]
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Michael Schwartzkopff
2015-04-09 08:52:52 UTC
Permalink
Am Donnerstag, 9. April 2015, 10:27:51 schrieben Sie:

(...)
Post by p***@arz.at
why does pacemaker try to move VM to joining node?
...
(...)
Post by p***@arz.at
....
<rsc_location id="cli-ban-lnx0106a-on-lnx0129a" node="lnx0129a"
role="Started" rsc="lnx0106a" score="-INFINITY"/>
...
You orderd pacemaker to do so. Probably by a crm resource migrate command.

Location constraints that start with a "cli-" were induced by command line
operations.
Post by p***@arz.at
after cleanup of VM Resource lnx0106a the host lnx0083a starts successful
thank you!
Perhaps you delete the location constraint.


Mit freundlichen Grüßen,

Michael Schwartzkopff
--
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein
p***@arz.at
2015-04-09 09:27:45 UTC
Permalink
Post by p***@arz.at
(...)
why does pacemaker try to move VM to joining node?
...
(...)
Post by p***@arz.at
....
<rsc_location id="cli-ban-lnx0106a-on-lnx0129a" node="lnx0129a"
role="Started" rsc="lnx0106a" score="-INFINITY"/>
...
You orderd pacemaker to do so. Probably by a crm resource migrate command.
Location constraints that start with a "cli-" were induced by command line
operations.
hi,

no, the constraint is for different server. VM lnx0106a is running on
lnx0129b. constraint is for lnx0129a.
after cleanup and startup of fenced node i still have above mentioned
constraint.

thank you
Post by p***@arz.at
after cleanup of VM Resource lnx0106a the host lnx0083a starts
successful
Post by p***@arz.at
thank you!
Perhaps you delete the location constraint.
Mit freundlichen Grüßen,
Michael Schwartzkopff
Loading...