[Pacemaker] Node sends shutdown request to other node:error:

Discussion:

K Mehta

2013-07-16 13:03:42 UTC

I have a two node test cluster running with CMAN plugin. Fencing is not
configured. I see that vsanqa7 sends a message to vsanqa8 to shutdown.
However, it is not clear why vsanqa7 takes this decision.

===================/var/log/messages=================================

Node vsanqa7

Jul 15 08:51:18 vsanqa7 corosync[12081]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa7 corosync[12081]: [CMAN ] quorum regained,
resuming activity
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] This node is within the
primary component and will provide service.
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa7 crmd[12372]: notice: cman_event_callback:
Membership 4035520: quorum acquired
Jul 15 08:51:18 vsanqa7 crmd[12372]: notice: crm_update_peer_state:
cman_event_callback: Node vsanqa8[2] - state is now member
Jul 15 08:51:18 vsanqa7 corosync[12081]: [CPG ] chosen downlist: sender
r(0) ip(172.16.68.120) ; members(old:1 left:0)
Jul 15 08:51:18 vsanqa7 corosync[12081]: [MAIN ] Completed service
synchronization, ready to provide service.
Jul 15 08:51:30 vsanqa7 crmd[12372]: warning: match_down_event: No match
for shutdown action on vsanqa8
Jul 15 08:51:30 vsanqa7 crmd[12372]: notice: do_state_transition: State
transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL
origin=peer_update_callback ]
Jul 15 08:51:30 vsanqa7 cib[12367]: warning: cib_process_diff: Diff
0.2760.0 -> 0.2760.1 from vsanqa8 not applied to 0.2760.307: current
"num_updates" is greater than required
Jul 15 08:51:31 vsanqa7 kernel: send_and_wait_for_client_info failed with
-110 uuid=0x74
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_trigger_update:
Sending flush op to all hosts for:
master-vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839 (8)
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: unpack_config: On loss of
CCM Quorum: Ignore

Why vsanqa8 is scheduled for shutdown ? <<<<<

Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: stage6: Scheduling Node
vsanqa8 for shutdown
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: process_pe_message:
Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-3530.bz2

Node vsanqa8

Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Corosync Cluster Engine
('1.4.1'): started and ready to provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Corosync built-in
features: nss dbus rdma snmp
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Successfully read
config from /etc/cluster/cluster.conf
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Successfully parsed
cman config
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] Initializing transport
(UDP/IP Multicast).
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] The network interface
[172.16.68.126] is now up.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Using quorum provider
quorum_cman
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync cluster quorum service v0.1
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CMAN ] CMAN 3.0.12.1 (built
Feb 22 2013 07:20:27) started
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync CMAN membership service 2.90
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
openais checkpoint service B.01.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync extended virtual synchrony service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync configuration service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync cluster closed process group service v1.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync cluster config database access v1.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync profile loading service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Using quorum provider
quorum_cman
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync cluster quorum service v0.1
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Compatibility mode set
to whitetank. Using V1 and V2 of the synchronization engine.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[1]: 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[1]: 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CPG ] chosen downlist: sender
r(0) ip(172.16.68.126) ; members(old:0 left:0)
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Completed service
synchronization, ready to provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CMAN ] quorum regained,
resuming activity
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] This node is within the
primary component and will provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CPG ] chosen downlist: sender
r(0) ip(172.16.68.120) ; members(old:1 left:0)
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Completed service
synchronization, ready to provide service.
Jul 15 08:51:22 vsanqa8 fenced[21447]: fenced 3.0.12.1 started
Jul 15 08:51:22 vsanqa8 dlm_controld[21467]: dlm_controld 3.0.12.1 started
Jul 15 08:51:23 vsanqa8 gfs_controld[21522]: gfs_controld 3.0.12.1 started
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: main: Starting
Pacemaker 1.1.8-7.el6 (Build: 394e906): generated-manpages agent-manpages
ascii-docs publican-docs ncurses libqb-logging libqb-ipc corosync-plugin
cman
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: update_node_processes:
0x13c1f80 Node 2 now known as vsanqa8, was:
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: update_node_processes:
0x13be960 Node 1 now known as vsanqa7, was:
Jul 15 08:51:29 vsanqa8 lrmd[21681]: notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 stonith-ng[21680]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 stonith-ng[21680]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: main: Using legacy config
location: /var/lib/heartbeat/crm
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jul 15 08:51:29 vsanqa8 crmd[21684]: notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 crmd[21684]: notice: main: CRM Git Version:
394e906
Jul 15 08:51:29 vsanqa8 pengine[21683]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: main: Starting mainloop...
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jul 15 08:51:30 vsanqa8 stonith-ng[21680]: notice: setup_cib: Watching
for stonith topology changes
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: cman_event_callback:
Membership 4035520: quorum acquired
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_update_peer_state:
cman_event_callback: Node vsanqa7[1] - state is now member
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_update_peer_state:
cman_event_callback: Node vsanqa8[2] - state is now member
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: do_started: The local CRM is
operational
Jul 15 08:51:32 vsanqa8 crmd[21684]: notice: do_state_transition: State
transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
origin=do_cl_join_finalize_respond ]
Jul 15 08:51:32 vsanqa8 attrd[21682]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)

vsanqa8 is not expecting a shutdown request from vsanqa7 <<<<<<<

Jul 15 08:51:32 vsanqa8 crmd[21684]: error: handle_request: We didn't
ask to be shut down, yet our DC is telling us too.
Jul 15 08:51:32 vsanqa8 crmd[21684]: notice: do_state_transition: State
transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE
origin=route_message ]
Jul 15 08:51:33 vsanqa8 crmd[21684]: notice: process_lrm_event: LRM
operation vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839_monitor_0 (call=6, rc=7,
cib-update=9, confirmed=true) not running
Jul 15 08:51:33 vsanqa8 crmd[21684]: notice: terminate_cs_connection:
Disconnecting from Corosync
Jul 15 08:51:33 vsanqa8 cib[21679]: warning: qb_ipcs_event_sendv:
new_event_notification (21679-21684-12): Broken pipe (32)
Jul 15 08:51:33 vsanqa8 crmd[21684]: warning: do_exit: Inhibiting respawn
by Heartbeat
Jul 15 08:51:33 vsanqa8 cib[21679]: warning: do_local_notify: A-Sync reply
to crmd failed: No message of desired type
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: error: pcmk_child_exit: Child
process crmd exited (pid=21684, rc=100)
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: warning: pcmk_child_exit:
Pacemaker child process crmd no longer wishes to be respawned. Shutting
ourselves down.
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: pcmk_shutdown_worker:
Shuting down Pacemaker
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping
pengine: Sent -15 to process 21683
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping
attrd: Sent -15 to process 21682
Jul 15 08:51:33 vsanqa8 attrd[21682]: notice: main: Exiting...
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping
lrmd: Sent -15 to process 21681
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping
stonith-ng: Sent -15 to process 21680

=========================corosync.log===============================================

vsanqa7

Jul 15 08:51:29 [12367] vsanqa7 cib: info: pcmk_cpg_membership:
Member[3.1] cib.2
Jul 15 08:51:29 [12367] vsanqa7 cib: info: crm_update_peer_proc:
pcmk_cpg_membership: Node vsanqa8[2] - corosync-cpg is now online
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership:
Joined[3.0] crmd.2
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership:
Member[3.0] crmd.1
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership:
Member[3.1] crmd.2
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: crm_update_peer_proc:
pcmk_cpg_membership: Node vsanqa8[2] - corosync-cpg is now online
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: peer_update_callback:
Client vsanqa8/peer now has status [online] (DC=true)
Jul 15 08:51:30 [12372] vsanqa7 crmd: warning: match_down_event:
No match for shutdown action on vsanqa8
Jul 15 08:51:30 [12372] vsanqa7 crmd: notice: do_state_transition:
State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN
cause=C_FSA_INTERNAL origin=peer_update_callback ]
Jul 15 08:51:30 [12372] vsanqa7 crmd: info:
abort_transition_graph: do_te_invoke:163 - Triggered transition abort
(complete=1) : Peer Halt
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: join_make_offer:
Making join offers based on membership 4035520
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: do_dc_join_offer_all:
join-2: Waiting on 2 outstanding join acks
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: update_dc: Set
DC to vsanqa7 (3.0.7)
Jul 15 08:51:30 [12367] vsanqa7 cib: warning: cib_process_diff:
Diff 0.2760.0 -> 0.2760.1 from vsanqa8 not applied to 0.2760.307:
current "num_updates" is greater than required
Jul 15 08:51:30 [12367] vsanqa7 cib: info: cib_process_request:
Operation complete: op cib_sync_one for section 'all'
(origin=vsanqa8/vsanqa8/(null), version=0.2760.307): OK (rc=0)
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: do_dc_join_offer_all:
A new node joined the cluster
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: do_dc_join_offer_all:
join-3: Waiting on 2 outstanding join acks
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: update_dc: Set
DC to vsanqa7 (3.0.7)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info:
crm_update_peer_expected: do_dc_join_filter_offer: Node vsanqa8[2]
- expected state is now member
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_state_transition:
State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_finalize:
join-3: Syncing the CIB from vsanqa7 to the rest of the cluster
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request:
Operation complete: op cib_sync for section 'all'
(origin=local/crmd/43, version=0.2760.307): OK (rc=0)
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request:
Operation complete: op cib_modify for section nodes
(origin=local/crmd/44, version=0.2760.308): OK (rc=0)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_ack:
join-3: Updating node state to member for vsanqa7
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: erase_status_tag:
Deleting xpath: //node_state[@uname='vsanqa7']/lrm
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_ack:
join-3: Updating node state to member for vsanqa8
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: erase_status_tag:
Deleting xpath: //node_state[@uname='vsanqa8']/lrm
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request:
Operation complete: op cib_modify for section nodes
(origin=local/crmd/45, version=0.2760.309): OK (rc=0)
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request:
Operation complete: op cib_delete for section
//node_state[@uname='vsanqa7']/lrm (origin=local/crmd/46,
version=0.2760.310): OK (rc=0)
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request:
Operation complete: op cib_delete for section
//node_state[@uname='vsanqa8']/lrm (origin=local/crmd/48,
version=0.2760.312): OK (rc=0)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_state_transition:
State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [
input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
Jul 15 08:51:32 [12372] vsanqa7 crmd: info:
abort_transition_graph: do_te_invoke:156 - Triggered transition abort
(complete=1) : Peer Cancelled
Jul 15 08:51:32 [12370] vsanqa7 attrd: notice: attrd_local_callback:
Sending full refresh (origin=crmd)

vsanqa8

Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_diff:
Diff 0.2760.306 -> 0.2760.307 from vsanqa7 not applied to 0.2760.1:
current "num_updates" is less than required
Jul 15 08:51:30 [21679] vsanqa8 cib: info:
cib_server_process_diff: Requesting re-sync from peer
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_started:
Delaying start, Config not read (0000000000000040)
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_started:
Delaying start, Config not read (0000000000000040)
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: qb_ipcs_us_publish:
server name: crmd
Jul 15 08:51:30 [21684] vsanqa8 crmd: notice: do_started: The
local CRM is operational
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_state_transition:
State transition S_STARTING -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_started ]
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_replace:
Digest matched on replace from vsanqa7:
0936115228a36c943f181954830c9b2b
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_replace:
Replaced 0.2760.1 with 0.2760.307 from vsanqa7
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_replace_notify:
Replaced: 0.2760.1 -> 0.2760.307 from vsanqa7
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership:
Joined[0.0] crmd.2
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership:
Member[0.0] crmd.1
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: crm_update_peer_proc:
pcmk_cpg_membership: Node vsanqa7[1] - corosync-cpg is now online
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: peer_update_callback:
Client vsanqa7/peer now has status [online] (DC=<null>)
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership:
Member[0.1] crmd.2
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: update_dc: Set
DC to vsanqa7 (3.0.7)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command:
Processed register from crmd.21684: OK (0)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command:
Processed st_notify from crmd.21684: OK (0)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command:
Processed st_notify from crmd.21684: OK (0)
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: erase_status_tag:
Deleting xpath: //node_state[@uname='vsanqa8']/transient_attributes
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: update_attrd:
Connecting to attrd... 5 retries remaining
Jul 15 08:51:32 [21684] vsanqa8 crmd: notice: do_state_transition:
State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC
cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
Jul 15 08:51:32 [21679] vsanqa8 cib: info: cib_process_replace:
Digest matched on replace from vsanqa7:
0936115228a36c943f181954830c9b2b
Jul 15 08:51:32 [21679] vsanqa8 cib: info: cib_process_replace:
Replaced 0.2760.307 with 0.2760.307 from vsanqa7
Jul 15 08:51:32 [21682] vsanqa8 attrd: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info:
process_lrmd_get_rsc_info: Resource
'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839' not found (0 active resources)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info:
process_lrmd_get_rsc_info: Resource
'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839:0' not found (0 active resources)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info:
process_lrmd_rsc_register: Added
'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839' to the rsc list (1 active
resources)
Jul 15 08:51:32 [21684] vsanqa8 crmd: error: handle_request: We
didn't ask to be shut down, yet our DC is telling us too.
Jul 15 08:51:32 [21684] vsanqa8 crmd: notice: do_state_transition:
State transition S_NOT_DC -> S_STOPPING [ input=I_STOP
cause=C_HA_MESSAGE origin=route_message ]
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: do_shutdown:
Disconnecting STONITH...
Jul 15 08:51:32 [21684] vsanqa8 crmd: info:
tengine_stonith_connection_destroy: Fencing daemon disconnected
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: verify_stopped: 1
pending LRM operations at shutdown... waiting
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: ghash_print_pending:
Pending action: vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839:6
(vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839_monitor_0)

Regards,
Kiran
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20130716/1b8887d2/attachment-0001.html>

Andrew Beekhof

2013-07-16 23:01:26 UTC

Permalink

I have a two node test cluster running with CMAN plugin. Fencing is not configured.

Thats problem 1

I see that vsanqa7 sends a message to vsanqa8 to shutdown.
However, it is not clear why vsanqa7 takes this decision.

It looks like you're hitting the problem addressed by:

https://github.com/beekhof/pacemaker/commit/913a850

Is this on RHEL? or one of the rebuilds?

===================/var/log/messages=================================
Node vsanqa7
Jul 15 08:51:18 vsanqa7 corosync[12081]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa7 corosync[12081]: [CMAN ] quorum regained, resuming activity
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] This node is within the primary component and will provide service.
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa7 crmd[12372]: notice: cman_event_callback: Membership 4035520: quorum acquired
Jul 15 08:51:18 vsanqa7 crmd[12372]: notice: crm_update_peer_state: cman_event_callback: Node vsanqa8[2] - state is now member
Jul 15 08:51:18 vsanqa7 corosync[12081]: [CPG ] chosen downlist: sender r(0) ip(172.16.68.120) ; members(old:1 left:0)
Jul 15 08:51:18 vsanqa7 corosync[12081]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 15 08:51:30 vsanqa7 crmd[12372]: warning: match_down_event: No match for shutdown action on vsanqa8
Jul 15 08:51:30 vsanqa7 crmd[12372]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ]
Jul 15 08:51:30 vsanqa7 cib[12367]: warning: cib_process_diff: Diff 0.2760.0 -> 0.2760.1 from vsanqa8 not applied to 0.2760.307: current "num_updates" is greater than required
Jul 15 08:51:31 vsanqa7 kernel: send_and_wait_for_client_info failed with -110 uuid=0x74
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839 (8)
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: unpack_config: On loss of CCM Quorum: Ignore

Why vsanqa8 is scheduled for shutdown ? <<<<<

Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: stage6: Scheduling Node vsanqa8 for shutdown
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-3530.bz2
Node vsanqa8
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Corosync built-in features: nss dbus rdma snmp
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Successfully parsed cman config
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] The network interface [172.16.68.126] is now up.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Using quorum provider quorum_cman
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CMAN ] CMAN 3.0.12.1 (built Feb 22 2013 07:20:27) started
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: openais checkpoint service B.01.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync configuration service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync profile loading service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Using quorum provider quorum_cman
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[1]: 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[1]: 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CPG ] chosen downlist: sender r(0) ip(172.16.68.126) ; members(old:0 left:0)
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CMAN ] quorum regained, resuming activity
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] This node is within the primary component and will provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CPG ] chosen downlist: sender r(0) ip(172.16.68.120) ; members(old:1 left:0)
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 15 08:51:22 vsanqa8 fenced[21447]: fenced 3.0.12.1 started
Jul 15 08:51:22 vsanqa8 dlm_controld[21467]: dlm_controld 3.0.12.1 started
Jul 15 08:51:23 vsanqa8 gfs_controld[21522]: gfs_controld 3.0.12.1 started
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: main: Starting Pacemaker 1.1.8-7.el6 (Build: 394e906): generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc corosync-plugin cman
Jul 15 08:51:29 vsanqa8 lrmd[21681]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 stonith-ng[21680]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 stonith-ng[21680]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: main: Using legacy config location: /var/lib/heartbeat/crm
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman
Jul 15 08:51:29 vsanqa8 crmd[21684]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 crmd[21684]: notice: main: CRM Git Version: 394e906
Jul 15 08:51:29 vsanqa8 pengine[21683]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: main: Starting mainloop...
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman
Jul 15 08:51:30 vsanqa8 stonith-ng[21680]: notice: setup_cib: Watching for stonith topology changes
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: cman_event_callback: Membership 4035520: quorum acquired
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_update_peer_state: cman_event_callback: Node vsanqa7[1] - state is now member
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_update_peer_state: cman_event_callback: Node vsanqa8[2] - state is now member
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: do_started: The local CRM is operational
Jul 15 08:51:32 vsanqa8 crmd[21684]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
Jul 15 08:51:32 vsanqa8 attrd[21682]: notice: attrd_local_callback: Sending full refresh (origin=crmd)

vsanqa8 is not expecting a shutdown request from vsanqa7 <<<<<<<

Jul 15 08:51:32 vsanqa8 crmd[21684]: error: handle_request: We didn't ask to be shut down, yet our DC is telling us too.
Jul 15 08:51:32 vsanqa8 crmd[21684]: notice: do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE origin=route_message ]
Jul 15 08:51:33 vsanqa8 crmd[21684]: notice: process_lrm_event: LRM operation vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839_monitor_0 (call=6, rc=7, cib-update=9, confirmed=true) not running
Jul 15 08:51:33 vsanqa8 crmd[21684]: notice: terminate_cs_connection: Disconnecting from Corosync
Jul 15 08:51:33 vsanqa8 cib[21679]: warning: qb_ipcs_event_sendv: new_event_notification (21679-21684-12): Broken pipe (32)
Jul 15 08:51:33 vsanqa8 crmd[21684]: warning: do_exit: Inhibiting respawn by Heartbeat
Jul 15 08:51:33 vsanqa8 cib[21679]: warning: do_local_notify: A-Sync reply to crmd failed: No message of desired type
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: error: pcmk_child_exit: Child process crmd exited (pid=21684, rc=100)
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: warning: pcmk_child_exit: Pacemaker child process crmd no longer wishes to be respawned. Shutting ourselves down.
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: pcmk_shutdown_worker: Shuting down Pacemaker
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping pengine: Sent -15 to process 21683
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping attrd: Sent -15 to process 21682
Jul 15 08:51:33 vsanqa8 attrd[21682]: notice: main: Exiting...
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping lrmd: Sent -15 to process 21681
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping stonith-ng: Sent -15 to process 21680
=========================corosync.log===============================================
vsanqa7
Jul 15 08:51:29 [12367] vsanqa7 cib: info: pcmk_cpg_membership: Member[3.1] cib.2
Jul 15 08:51:29 [12367] vsanqa7 cib: info: crm_update_peer_proc: pcmk_cpg_membership: Node vsanqa8[2] - corosync-cpg is now online
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership: Joined[3.0] crmd.2
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership: Member[3.0] crmd.1
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership: Member[3.1] crmd.2
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node vsanqa8[2] - corosync-cpg is now online
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: peer_update_callback: Client vsanqa8/peer now has status [online] (DC=true)
Jul 15 08:51:30 [12372] vsanqa7 crmd: warning: match_down_event: No match for shutdown action on vsanqa8
Jul 15 08:51:30 [12372] vsanqa7 crmd: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ]
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: abort_transition_graph: do_te_invoke:163 - Triggered transition abort (complete=1) : Peer Halt
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: join_make_offer: Making join offers based on membership 4035520
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: do_dc_join_offer_all: join-2: Waiting on 2 outstanding join acks
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: update_dc: Set DC to vsanqa7 (3.0.7)
Jul 15 08:51:30 [12367] vsanqa7 cib: warning: cib_process_diff: Diff 0.2760.0 -> 0.2760.1 from vsanqa8 not applied to 0.2760.307: current "num_updates" is greater than required
Jul 15 08:51:30 [12367] vsanqa7 cib: info: cib_process_request: Operation complete: op cib_sync_one for section 'all' (origin=vsanqa8/vsanqa8/(null), version=0.2760.307): OK (rc=0)
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: do_dc_join_offer_all: A new node joined the cluster
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: do_dc_join_offer_all: join-3: Waiting on 2 outstanding join acks
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: update_dc: Set DC to vsanqa7 (3.0.7)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: crm_update_peer_expected: do_dc_join_filter_offer: Node vsanqa8[2] - expected state is now member
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_finalize: join-3: Syncing the CIB from vsanqa7 to the rest of the cluster
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/43, version=0.2760.307): OK (rc=0)
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/44, version=0.2760.308): OK (rc=0)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_ack: join-3: Updating node state to member for vsanqa7
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_ack: join-3: Updating node state to member for vsanqa8
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/45, version=0.2760.309): OK (rc=0)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: abort_transition_graph: do_te_invoke:156 - Triggered transition abort (complete=1) : Peer Cancelled
Jul 15 08:51:32 [12370] vsanqa7 attrd: notice: attrd_local_callback: Sending full refresh (origin=crmd)
vsanqa8
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_diff: Diff 0.2760.306 -> 0.2760.307 from vsanqa7 not applied to 0.2760.1: current "num_updates" is less than required
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_server_process_diff: Requesting re-sync from peer
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_started: Delaying start, Config not read (0000000000000040)
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_started: Delaying start, Config not read (0000000000000040)
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: qb_ipcs_us_publish: server name: crmd
Jul 15 08:51:30 [21684] vsanqa8 crmd: notice: do_started: The local CRM is operational
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_replace: Digest matched on replace from vsanqa7: 0936115228a36c943f181954830c9b2b
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_replace: Replaced 0.2760.1 with 0.2760.307 from vsanqa7
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_replace_notify: Replaced: 0.2760.1 -> 0.2760.307 from vsanqa7
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership: Joined[0.0] crmd.2
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership: Member[0.0] crmd.1
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node vsanqa7[1] - corosync-cpg is now online
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: peer_update_callback: Client vsanqa7/peer now has status [online] (DC=<null>)
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership: Member[0.1] crmd.2
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: update_dc: Set DC to vsanqa7 (3.0.7)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command: Processed register from crmd.21684: OK (0)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command: Processed st_notify from crmd.21684: OK (0)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command: Processed st_notify from crmd.21684: OK (0)
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: update_attrd: Connecting to attrd... 5 retries remaining
Jul 15 08:51:32 [21684] vsanqa8 crmd: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
Jul 15 08:51:32 [21679] vsanqa8 cib: info: cib_process_replace: Digest matched on replace from vsanqa7: 0936115228a36c943f181954830c9b2b
Jul 15 08:51:32 [21679] vsanqa8 cib: info: cib_process_replace: Replaced 0.2760.307 with 0.2760.307 from vsanqa7
Jul 15 08:51:32 [21682] vsanqa8 attrd: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info: process_lrmd_get_rsc_info: Resource 'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839' not found (0 active resources)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info: process_lrmd_get_rsc_info: Resource 'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839:0' not found (0 active resources)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info: process_lrmd_rsc_register: Added 'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839' to the rsc list (1 active resources)
Jul 15 08:51:32 [21684] vsanqa8 crmd: error: handle_request: We didn't ask to be shut down, yet our DC is telling us too.
Jul 15 08:51:32 [21684] vsanqa8 crmd: notice: do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE origin=route_message ]
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: do_shutdown: Disconnecting STONITH...
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: verify_stopped: 1 pending LRM operations at shutdown... waiting
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: ghash_print_pending: Pending action: vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839:6 (vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839_monitor_0)
Regards,
Kiran
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

K Mehta

2013-07-17 08:11:30 UTC

Permalink

Andrew,
Thanks to your reply. Here is the detail

[root at vsanqa8 ~]# rpm -qa | grep pacemaker
pacemaker-cluster-libs-1.1.8-7.el6.x86_64
pacemaker-1.1.8-7.el6.x86_64
pacemaker-libs-1.1.8-7.el6.x86_64
pacemaker-cli-1.1.8-7.el6.x86_64

[root at vsanqa8 ~]# rpm -qa | grep coro
corosynclib-1.4.1-15.el6_4.1.x86_64
corosync-1.4.1-15.el6_4.1.x86_64

[root at vsanqa8 ~]# rpm -qa | grep cman
cman-3.0.12.1-49.el6.x86_64

[root at vsanqa8 ~]# uname -a
Linux vsanqa8 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011
x86_64 x86_64 x86_64 GNU/Linux

[root at vsanqa8 ~]# cat /etc/redhat-release
CentOS release 6.2 (Final)
[root at vsanqa8 ~]#

so do I need to upgrade to pacemaker 1.1.9 for the fix ?

Regards,
Kiran

Post by K Mehta

Post by K Mehta
I have a two node test cluster running with CMAN plugin. Fencing is not

configured.
Thats problem 1

Post by K Mehta
I see that vsanqa7 sends a message to vsanqa8 to shutdown.
However, it is not clear why vsanqa7 takes this decision.

https://github.com/beekhof/pacemaker/commit/913a850
Is this on RHEL? or one of the rebuilds?

Post by K Mehta
===================/var/log/messages=================================
Node vsanqa7
Jul 15 08:51:18 vsanqa7 corosync[12081]: [TOTEM ] A processor joined

or left the membership and a new membership was formed.

Post by K Mehta
Jul 15 08:51:18 vsanqa7 corosync[12081]: [CMAN ] quorum regained,

resuming activity

Post by K Mehta
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] This node is within

the primary component and will provide service.

Post by K Mehta
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] Members[2]: 1 2

Membership 4035520: quorum acquired
cman_event_callback: Node vsanqa8[2] - state is now member
sender r(0) ip(172.16.68.120) ; members(old:1 left:0)

Post by K Mehta
Jul 15 08:51:18 vsanqa7 corosync[12081]: [MAIN ] Completed service

synchronization, ready to provide service.

Post by K Mehta
Jul 15 08:51:30 vsanqa7 crmd[12372]: warning: match_down_event: No

match for shutdown action on vsanqa8
State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN
cause=C_FSA_INTERNAL origin=peer_update_callback ]

Post by K Mehta
Jul 15 08:51:30 vsanqa7 cib[12367]: warning: cib_process_diff: Diff

0.2760.0 -> 0.2760.1 from vsanqa8 not applied to 0.2760.307: current
"num_updates" is greater than required

Post by K Mehta
Jul 15 08:51:31 vsanqa7 kernel: send_and_wait_for_client_info failed

with -110 uuid=0x74
Sending full refresh (origin=crmd)
master-vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839 (8)
Sending flush op to all hosts for: probe_complete (true)

Post by K Mehta
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: unpack_config: On loss

of CCM Quorum: Ignore

Post by K Mehta

Why vsanqa8 is scheduled for shutdown ? <<<<<

Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: stage6: Scheduling

Node vsanqa8 for shutdown
Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-3530.bz2

Post by K Mehta
Node vsanqa8
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Corosync Cluster

Engine ('1.4.1'): started and ready to provide service.

Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Corosync built-in

features: nss dbus rdma snmp

Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Successfully read

config from /etc/cluster/cluster.conf

Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Successfully parsed

cman config

Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] Initializing

transport (UDP/IP Multicast).

Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] Initializing

transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] The network

interface [172.16.68.126] is now up.

Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Using quorum

provider quorum_cman