Discussion:
[Pacemaker] Node sends shutdown request to other node:error:
K Mehta
2013-07-16 13:03:42 UTC
Permalink
I have a two node test cluster running with CMAN plugin. Fencing is not
configured. I see that vsanqa7 sends a message to vsanqa8 to shutdown.
However, it is not clear why vsanqa7 takes this decision.




===================/var/log/messages=================================

Node vsanqa7

Jul 15 08:51:18 vsanqa7 corosync[12081]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa7 corosync[12081]: [CMAN ] quorum regained,
resuming activity
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] This node is within the
primary component and will provide service.
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa7 crmd[12372]: notice: cman_event_callback:
Membership 4035520: quorum acquired
Jul 15 08:51:18 vsanqa7 crmd[12372]: notice: crm_update_peer_state:
cman_event_callback: Node vsanqa8[2] - state is now member
Jul 15 08:51:18 vsanqa7 corosync[12081]: [CPG ] chosen downlist: sender
r(0) ip(172.16.68.120) ; members(old:1 left:0)
Jul 15 08:51:18 vsanqa7 corosync[12081]: [MAIN ] Completed service
synchronization, ready to provide service.
Jul 15 08:51:30 vsanqa7 crmd[12372]: warning: match_down_event: No match
for shutdown action on vsanqa8
Jul 15 08:51:30 vsanqa7 crmd[12372]: notice: do_state_transition: State
transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL
origin=peer_update_callback ]
Jul 15 08:51:30 vsanqa7 cib[12367]: warning: cib_process_diff: Diff
0.2760.0 -> 0.2760.1 from vsanqa8 not applied to 0.2760.307: current
"num_updates" is greater than required
Jul 15 08:51:31 vsanqa7 kernel: send_and_wait_for_client_info failed with
-110 uuid=0x74
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_trigger_update:
Sending flush op to all hosts for:
master-vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839 (8)
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Why vsanqa8 is scheduled for shutdown ? <<<<<
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: stage6: Scheduling Node
vsanqa8 for shutdown
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: process_pe_message:
Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-3530.bz2




Node vsanqa8

Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Corosync Cluster Engine
('1.4.1'): started and ready to provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Corosync built-in
features: nss dbus rdma snmp
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Successfully read
config from /etc/cluster/cluster.conf
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Successfully parsed
cman config
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] Initializing transport
(UDP/IP Multicast).
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] The network interface
[172.16.68.126] is now up.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Using quorum provider
quorum_cman
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync cluster quorum service v0.1
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CMAN ] CMAN 3.0.12.1 (built
Feb 22 2013 07:20:27) started
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync CMAN membership service 2.90
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
openais checkpoint service B.01.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync extended virtual synchrony service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync configuration service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync cluster closed process group service v1.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync cluster config database access v1.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync profile loading service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Using quorum provider
quorum_cman
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded:
corosync cluster quorum service v0.1
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Compatibility mode set
to whitetank. Using V1 and V2 of the synchronization engine.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[1]: 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[1]: 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CPG ] chosen downlist: sender
r(0) ip(172.16.68.126) ; members(old:0 left:0)
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Completed service
synchronization, ready to provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CMAN ] quorum regained,
resuming activity
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] This node is within the
primary component and will provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CPG ] chosen downlist: sender
r(0) ip(172.16.68.120) ; members(old:1 left:0)
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Completed service
synchronization, ready to provide service.
Jul 15 08:51:22 vsanqa8 fenced[21447]: fenced 3.0.12.1 started
Jul 15 08:51:22 vsanqa8 dlm_controld[21467]: dlm_controld 3.0.12.1 started
Jul 15 08:51:23 vsanqa8 gfs_controld[21522]: gfs_controld 3.0.12.1 started
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: main: Starting
Pacemaker 1.1.8-7.el6 (Build: 394e906): generated-manpages agent-manpages
ascii-docs publican-docs ncurses libqb-logging libqb-ipc corosync-plugin
cman
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: update_node_processes:
0x13c1f80 Node 2 now known as vsanqa8, was:
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: update_node_processes:
0x13be960 Node 1 now known as vsanqa7, was:
Jul 15 08:51:29 vsanqa8 lrmd[21681]: notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 stonith-ng[21680]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 stonith-ng[21680]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: main: Using legacy config
location: /var/lib/heartbeat/crm
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jul 15 08:51:29 vsanqa8 crmd[21684]: notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 crmd[21684]: notice: main: CRM Git Version:
394e906
Jul 15 08:51:29 vsanqa8 pengine[21683]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: main: Starting mainloop...
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jul 15 08:51:30 vsanqa8 stonith-ng[21680]: notice: setup_cib: Watching
for stonith topology changes
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: cman_event_callback:
Membership 4035520: quorum acquired
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_update_peer_state:
cman_event_callback: Node vsanqa7[1] - state is now member
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_update_peer_state:
cman_event_callback: Node vsanqa8[2] - state is now member
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: do_started: The local CRM is
operational
Jul 15 08:51:32 vsanqa8 crmd[21684]: notice: do_state_transition: State
transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
origin=do_cl_join_finalize_respond ]
Jul 15 08:51:32 vsanqa8 attrd[21682]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
vsanqa8 is not expecting a shutdown request from vsanqa7 <<<<<<<
Jul 15 08:51:32 vsanqa8 crmd[21684]: error: handle_request: We didn't
ask to be shut down, yet our DC is telling us too.
Jul 15 08:51:32 vsanqa8 crmd[21684]: notice: do_state_transition: State
transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE
origin=route_message ]
Jul 15 08:51:33 vsanqa8 crmd[21684]: notice: process_lrm_event: LRM
operation vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839_monitor_0 (call=6, rc=7,
cib-update=9, confirmed=true) not running
Jul 15 08:51:33 vsanqa8 crmd[21684]: notice: terminate_cs_connection:
Disconnecting from Corosync
Jul 15 08:51:33 vsanqa8 cib[21679]: warning: qb_ipcs_event_sendv:
new_event_notification (21679-21684-12): Broken pipe (32)
Jul 15 08:51:33 vsanqa8 crmd[21684]: warning: do_exit: Inhibiting respawn
by Heartbeat
Jul 15 08:51:33 vsanqa8 cib[21679]: warning: do_local_notify: A-Sync reply
to crmd failed: No message of desired type
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: error: pcmk_child_exit: Child
process crmd exited (pid=21684, rc=100)
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: warning: pcmk_child_exit:
Pacemaker child process crmd no longer wishes to be respawned. Shutting
ourselves down.
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: pcmk_shutdown_worker:
Shuting down Pacemaker
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping
pengine: Sent -15 to process 21683
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping
attrd: Sent -15 to process 21682
Jul 15 08:51:33 vsanqa8 attrd[21682]: notice: main: Exiting...
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping
lrmd: Sent -15 to process 21681
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping
stonith-ng: Sent -15 to process 21680







=========================corosync.log===============================================


vsanqa7

Jul 15 08:51:29 [12367] vsanqa7 cib: info: pcmk_cpg_membership:
Member[3.1] cib.2
Jul 15 08:51:29 [12367] vsanqa7 cib: info: crm_update_peer_proc:
pcmk_cpg_membership: Node vsanqa8[2] - corosync-cpg is now online
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership:
Joined[3.0] crmd.2
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership:
Member[3.0] crmd.1
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership:
Member[3.1] crmd.2
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: crm_update_peer_proc:
pcmk_cpg_membership: Node vsanqa8[2] - corosync-cpg is now online
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: peer_update_callback:
Client vsanqa8/peer now has status [online] (DC=true)
Jul 15 08:51:30 [12372] vsanqa7 crmd: warning: match_down_event:
No match for shutdown action on vsanqa8
Jul 15 08:51:30 [12372] vsanqa7 crmd: notice: do_state_transition:
State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN
cause=C_FSA_INTERNAL origin=peer_update_callback ]
Jul 15 08:51:30 [12372] vsanqa7 crmd: info:
abort_transition_graph: do_te_invoke:163 - Triggered transition abort
(complete=1) : Peer Halt
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: join_make_offer:
Making join offers based on membership 4035520
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: do_dc_join_offer_all:
join-2: Waiting on 2 outstanding join acks
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: update_dc: Set
DC to vsanqa7 (3.0.7)
Jul 15 08:51:30 [12367] vsanqa7 cib: warning: cib_process_diff:
Diff 0.2760.0 -> 0.2760.1 from vsanqa8 not applied to 0.2760.307:
current "num_updates" is greater than required
Jul 15 08:51:30 [12367] vsanqa7 cib: info: cib_process_request:
Operation complete: op cib_sync_one for section 'all'
(origin=vsanqa8/vsanqa8/(null), version=0.2760.307): OK (rc=0)
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: do_dc_join_offer_all:
A new node joined the cluster
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: do_dc_join_offer_all:
join-3: Waiting on 2 outstanding join acks
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: update_dc: Set
DC to vsanqa7 (3.0.7)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info:
crm_update_peer_expected: do_dc_join_filter_offer: Node vsanqa8[2]
- expected state is now member
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_state_transition:
State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_finalize:
join-3: Syncing the CIB from vsanqa7 to the rest of the cluster
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request:
Operation complete: op cib_sync for section 'all'
(origin=local/crmd/43, version=0.2760.307): OK (rc=0)
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request:
Operation complete: op cib_modify for section nodes
(origin=local/crmd/44, version=0.2760.308): OK (rc=0)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_ack:
join-3: Updating node state to member for vsanqa7
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: erase_status_tag:
Deleting xpath: //node_state[@uname='vsanqa7']/lrm
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_ack:
join-3: Updating node state to member for vsanqa8
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: erase_status_tag:
Deleting xpath: //node_state[@uname='vsanqa8']/lrm
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request:
Operation complete: op cib_modify for section nodes
(origin=local/crmd/45, version=0.2760.309): OK (rc=0)
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request:
Operation complete: op cib_delete for section
//node_state[@uname='vsanqa7']/lrm (origin=local/crmd/46,
version=0.2760.310): OK (rc=0)
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request:
Operation complete: op cib_delete for section
//node_state[@uname='vsanqa8']/lrm (origin=local/crmd/48,
version=0.2760.312): OK (rc=0)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_state_transition:
State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [
input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
Jul 15 08:51:32 [12372] vsanqa7 crmd: info:
abort_transition_graph: do_te_invoke:156 - Triggered transition abort
(complete=1) : Peer Cancelled
Jul 15 08:51:32 [12370] vsanqa7 attrd: notice: attrd_local_callback:
Sending full refresh (origin=crmd)







vsanqa8

Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_diff:
Diff 0.2760.306 -> 0.2760.307 from vsanqa7 not applied to 0.2760.1:
current "num_updates" is less than required
Jul 15 08:51:30 [21679] vsanqa8 cib: info:
cib_server_process_diff: Requesting re-sync from peer
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_started:
Delaying start, Config not read (0000000000000040)
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_started:
Delaying start, Config not read (0000000000000040)
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: qb_ipcs_us_publish:
server name: crmd
Jul 15 08:51:30 [21684] vsanqa8 crmd: notice: do_started: The
local CRM is operational
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_state_transition:
State transition S_STARTING -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_started ]
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_replace:
Digest matched on replace from vsanqa7:
0936115228a36c943f181954830c9b2b
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_replace:
Replaced 0.2760.1 with 0.2760.307 from vsanqa7
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_replace_notify:
Replaced: 0.2760.1 -> 0.2760.307 from vsanqa7
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership:
Joined[0.0] crmd.2
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership:
Member[0.0] crmd.1
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: crm_update_peer_proc:
pcmk_cpg_membership: Node vsanqa7[1] - corosync-cpg is now online
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: peer_update_callback:
Client vsanqa7/peer now has status [online] (DC=<null>)
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership:
Member[0.1] crmd.2
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: update_dc: Set
DC to vsanqa7 (3.0.7)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command:
Processed register from crmd.21684: OK (0)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command:
Processed st_notify from crmd.21684: OK (0)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command:
Processed st_notify from crmd.21684: OK (0)
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: erase_status_tag:
Deleting xpath: //node_state[@uname='vsanqa8']/transient_attributes
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: update_attrd:
Connecting to attrd... 5 retries remaining
Jul 15 08:51:32 [21684] vsanqa8 crmd: notice: do_state_transition:
State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC
cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
Jul 15 08:51:32 [21679] vsanqa8 cib: info: cib_process_replace:
Digest matched on replace from vsanqa7:
0936115228a36c943f181954830c9b2b
Jul 15 08:51:32 [21679] vsanqa8 cib: info: cib_process_replace:
Replaced 0.2760.307 with 0.2760.307 from vsanqa7
Jul 15 08:51:32 [21682] vsanqa8 attrd: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info:
process_lrmd_get_rsc_info: Resource
'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839' not found (0 active resources)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info:
process_lrmd_get_rsc_info: Resource
'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839:0' not found (0 active resources)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info:
process_lrmd_rsc_register: Added
'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839' to the rsc list (1 active
resources)
Jul 15 08:51:32 [21684] vsanqa8 crmd: error: handle_request: We
didn't ask to be shut down, yet our DC is telling us too.
Jul 15 08:51:32 [21684] vsanqa8 crmd: notice: do_state_transition:
State transition S_NOT_DC -> S_STOPPING [ input=I_STOP
cause=C_HA_MESSAGE origin=route_message ]
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: do_shutdown:
Disconnecting STONITH...
Jul 15 08:51:32 [21684] vsanqa8 crmd: info:
tengine_stonith_connection_destroy: Fencing daemon disconnected
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: verify_stopped: 1
pending LRM operations at shutdown... waiting
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: ghash_print_pending:
Pending action: vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839:6
(vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839_monitor_0)


Regards,
Kiran
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20130716/1b8887d2/attachment-0001.html>
Andrew Beekhof
2013-07-16 23:01:26 UTC
Permalink
I have a two node test cluster running with CMAN plugin. Fencing is not configured.
Thats problem 1
I see that vsanqa7 sends a message to vsanqa8 to shutdown.
However, it is not clear why vsanqa7 takes this decision.
It looks like you're hitting the problem addressed by:

https://github.com/beekhof/pacemaker/commit/913a850

Is this on RHEL? or one of the rebuilds?
===================/var/log/messages=================================
Node vsanqa7
Jul 15 08:51:18 vsanqa7 corosync[12081]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa7 corosync[12081]: [CMAN ] quorum regained, resuming activity
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] This node is within the primary component and will provide service.
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa7 crmd[12372]: notice: cman_event_callback: Membership 4035520: quorum acquired
Jul 15 08:51:18 vsanqa7 crmd[12372]: notice: crm_update_peer_state: cman_event_callback: Node vsanqa8[2] - state is now member
Jul 15 08:51:18 vsanqa7 corosync[12081]: [CPG ] chosen downlist: sender r(0) ip(172.16.68.120) ; members(old:1 left:0)
Jul 15 08:51:18 vsanqa7 corosync[12081]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 15 08:51:30 vsanqa7 crmd[12372]: warning: match_down_event: No match for shutdown action on vsanqa8
Jul 15 08:51:30 vsanqa7 crmd[12372]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ]
Jul 15 08:51:30 vsanqa7 cib[12367]: warning: cib_process_diff: Diff 0.2760.0 -> 0.2760.1 from vsanqa8 not applied to 0.2760.307: current "num_updates" is greater than required
Jul 15 08:51:31 vsanqa7 kernel: send_and_wait_for_client_info failed with -110 uuid=0x74
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839 (8)
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: unpack_config: On loss of CCM Quorum: Ignore
Why vsanqa8 is scheduled for shutdown ? <<<<<
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: stage6: Scheduling Node vsanqa8 for shutdown
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-3530.bz2
Node vsanqa8
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Corosync built-in features: nss dbus rdma snmp
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Successfully parsed cman config
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] The network interface [172.16.68.126] is now up.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Using quorum provider quorum_cman
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CMAN ] CMAN 3.0.12.1 (built Feb 22 2013 07:20:27) started
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: openais checkpoint service B.01.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync configuration service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync profile loading service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Using quorum provider quorum_cman
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[1]: 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[1]: 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CPG ] chosen downlist: sender r(0) ip(172.16.68.126) ; members(old:0 left:0)
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CMAN ] quorum regained, resuming activity
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] This node is within the primary component and will provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CPG ] chosen downlist: sender r(0) ip(172.16.68.120) ; members(old:1 left:0)
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 15 08:51:22 vsanqa8 fenced[21447]: fenced 3.0.12.1 started
Jul 15 08:51:22 vsanqa8 dlm_controld[21467]: dlm_controld 3.0.12.1 started
Jul 15 08:51:23 vsanqa8 gfs_controld[21522]: gfs_controld 3.0.12.1 started
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: main: Starting Pacemaker 1.1.8-7.el6 (Build: 394e906): generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc corosync-plugin cman
Jul 15 08:51:29 vsanqa8 lrmd[21681]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 stonith-ng[21680]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 stonith-ng[21680]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: main: Using legacy config location: /var/lib/heartbeat/crm
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman
Jul 15 08:51:29 vsanqa8 crmd[21684]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 crmd[21684]: notice: main: CRM Git Version: 394e906
Jul 15 08:51:29 vsanqa8 pengine[21683]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: main: Starting mainloop...
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman
Jul 15 08:51:30 vsanqa8 stonith-ng[21680]: notice: setup_cib: Watching for stonith topology changes
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: cman_event_callback: Membership 4035520: quorum acquired
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_update_peer_state: cman_event_callback: Node vsanqa7[1] - state is now member
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_update_peer_state: cman_event_callback: Node vsanqa8[2] - state is now member
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: do_started: The local CRM is operational
Jul 15 08:51:32 vsanqa8 crmd[21684]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
Jul 15 08:51:32 vsanqa8 attrd[21682]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
vsanqa8 is not expecting a shutdown request from vsanqa7 <<<<<<<
Jul 15 08:51:32 vsanqa8 crmd[21684]: error: handle_request: We didn't ask to be shut down, yet our DC is telling us too.
Jul 15 08:51:32 vsanqa8 crmd[21684]: notice: do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE origin=route_message ]
Jul 15 08:51:33 vsanqa8 crmd[21684]: notice: process_lrm_event: LRM operation vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839_monitor_0 (call=6, rc=7, cib-update=9, confirmed=true) not running
Jul 15 08:51:33 vsanqa8 crmd[21684]: notice: terminate_cs_connection: Disconnecting from Corosync
Jul 15 08:51:33 vsanqa8 cib[21679]: warning: qb_ipcs_event_sendv: new_event_notification (21679-21684-12): Broken pipe (32)
Jul 15 08:51:33 vsanqa8 crmd[21684]: warning: do_exit: Inhibiting respawn by Heartbeat
Jul 15 08:51:33 vsanqa8 cib[21679]: warning: do_local_notify: A-Sync reply to crmd failed: No message of desired type
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: error: pcmk_child_exit: Child process crmd exited (pid=21684, rc=100)
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: warning: pcmk_child_exit: Pacemaker child process crmd no longer wishes to be respawned. Shutting ourselves down.
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: pcmk_shutdown_worker: Shuting down Pacemaker
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping pengine: Sent -15 to process 21683
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping attrd: Sent -15 to process 21682
Jul 15 08:51:33 vsanqa8 attrd[21682]: notice: main: Exiting...
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping lrmd: Sent -15 to process 21681
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping stonith-ng: Sent -15 to process 21680
=========================corosync.log===============================================
vsanqa7
Jul 15 08:51:29 [12367] vsanqa7 cib: info: pcmk_cpg_membership: Member[3.1] cib.2
Jul 15 08:51:29 [12367] vsanqa7 cib: info: crm_update_peer_proc: pcmk_cpg_membership: Node vsanqa8[2] - corosync-cpg is now online
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership: Joined[3.0] crmd.2
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership: Member[3.0] crmd.1
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership: Member[3.1] crmd.2
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node vsanqa8[2] - corosync-cpg is now online
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: peer_update_callback: Client vsanqa8/peer now has status [online] (DC=true)
Jul 15 08:51:30 [12372] vsanqa7 crmd: warning: match_down_event: No match for shutdown action on vsanqa8
Jul 15 08:51:30 [12372] vsanqa7 crmd: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ]
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: abort_transition_graph: do_te_invoke:163 - Triggered transition abort (complete=1) : Peer Halt
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: join_make_offer: Making join offers based on membership 4035520
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: do_dc_join_offer_all: join-2: Waiting on 2 outstanding join acks
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: update_dc: Set DC to vsanqa7 (3.0.7)
Jul 15 08:51:30 [12367] vsanqa7 cib: warning: cib_process_diff: Diff 0.2760.0 -> 0.2760.1 from vsanqa8 not applied to 0.2760.307: current "num_updates" is greater than required
Jul 15 08:51:30 [12367] vsanqa7 cib: info: cib_process_request: Operation complete: op cib_sync_one for section 'all' (origin=vsanqa8/vsanqa8/(null), version=0.2760.307): OK (rc=0)
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: do_dc_join_offer_all: A new node joined the cluster
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: do_dc_join_offer_all: join-3: Waiting on 2 outstanding join acks
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: update_dc: Set DC to vsanqa7 (3.0.7)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: crm_update_peer_expected: do_dc_join_filter_offer: Node vsanqa8[2] - expected state is now member
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_finalize: join-3: Syncing the CIB from vsanqa7 to the rest of the cluster
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/43, version=0.2760.307): OK (rc=0)
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/44, version=0.2760.308): OK (rc=0)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_ack: join-3: Updating node state to member for vsanqa7
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_ack: join-3: Updating node state to member for vsanqa8
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/45, version=0.2760.309): OK (rc=0)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: abort_transition_graph: do_te_invoke:156 - Triggered transition abort (complete=1) : Peer Cancelled
Jul 15 08:51:32 [12370] vsanqa7 attrd: notice: attrd_local_callback: Sending full refresh (origin=crmd)
vsanqa8
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_diff: Diff 0.2760.306 -> 0.2760.307 from vsanqa7 not applied to 0.2760.1: current "num_updates" is less than required
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_server_process_diff: Requesting re-sync from peer
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_started: Delaying start, Config not read (0000000000000040)
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_started: Delaying start, Config not read (0000000000000040)
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: qb_ipcs_us_publish: server name: crmd
Jul 15 08:51:30 [21684] vsanqa8 crmd: notice: do_started: The local CRM is operational
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_replace: Digest matched on replace from vsanqa7: 0936115228a36c943f181954830c9b2b
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_replace: Replaced 0.2760.1 with 0.2760.307 from vsanqa7
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_replace_notify: Replaced: 0.2760.1 -> 0.2760.307 from vsanqa7
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership: Joined[0.0] crmd.2
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership: Member[0.0] crmd.1
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node vsanqa7[1] - corosync-cpg is now online
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: peer_update_callback: Client vsanqa7/peer now has status [online] (DC=<null>)
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership: Member[0.1] crmd.2
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: update_dc: Set DC to vsanqa7 (3.0.7)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command: Processed register from crmd.21684: OK (0)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command: Processed st_notify from crmd.21684: OK (0)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command: Processed st_notify from crmd.21684: OK (0)
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: update_attrd: Connecting to attrd... 5 retries remaining
Jul 15 08:51:32 [21684] vsanqa8 crmd: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
Jul 15 08:51:32 [21679] vsanqa8 cib: info: cib_process_replace: Digest matched on replace from vsanqa7: 0936115228a36c943f181954830c9b2b
Jul 15 08:51:32 [21679] vsanqa8 cib: info: cib_process_replace: Replaced 0.2760.307 with 0.2760.307 from vsanqa7
Jul 15 08:51:32 [21682] vsanqa8 attrd: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info: process_lrmd_get_rsc_info: Resource 'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839' not found (0 active resources)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info: process_lrmd_get_rsc_info: Resource 'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839:0' not found (0 active resources)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info: process_lrmd_rsc_register: Added 'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839' to the rsc list (1 active resources)
Jul 15 08:51:32 [21684] vsanqa8 crmd: error: handle_request: We didn't ask to be shut down, yet our DC is telling us too.
Jul 15 08:51:32 [21684] vsanqa8 crmd: notice: do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE origin=route_message ]
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: do_shutdown: Disconnecting STONITH...
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: verify_stopped: 1 pending LRM operations at shutdown... waiting
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: ghash_print_pending: Pending action: vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839:6 (vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839_monitor_0)
Regards,
Kiran
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
K Mehta
2013-07-17 08:11:30 UTC
Permalink
Andrew,
Thanks to your reply. Here is the detail

[root at vsanqa8 ~]# rpm -qa | grep pacemaker
pacemaker-cluster-libs-1.1.8-7.el6.x86_64
pacemaker-1.1.8-7.el6.x86_64
pacemaker-libs-1.1.8-7.el6.x86_64
pacemaker-cli-1.1.8-7.el6.x86_64

[root at vsanqa8 ~]# rpm -qa | grep coro
corosynclib-1.4.1-15.el6_4.1.x86_64
corosync-1.4.1-15.el6_4.1.x86_64

[root at vsanqa8 ~]# rpm -qa | grep cman
cman-3.0.12.1-49.el6.x86_64

[root at vsanqa8 ~]# uname -a
Linux vsanqa8 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011
x86_64 x86_64 x86_64 GNU/Linux

[root at vsanqa8 ~]# cat /etc/redhat-release
CentOS release 6.2 (Final)
[root at vsanqa8 ~]#

so do I need to upgrade to pacemaker 1.1.9 for the fix ?

Regards,
Kiran
Post by K Mehta
Post by K Mehta
I have a two node test cluster running with CMAN plugin. Fencing is not
configured.
Thats problem 1
Post by K Mehta
I see that vsanqa7 sends a message to vsanqa8 to shutdown.
However, it is not clear why vsanqa7 takes this decision.
https://github.com/beekhof/pacemaker/commit/913a850
Is this on RHEL? or one of the rebuilds?
Post by K Mehta
===================/var/log/messages=================================
Node vsanqa7
Jul 15 08:51:18 vsanqa7 corosync[12081]: [TOTEM ] A processor joined
or left the membership and a new membership was formed.
Post by K Mehta
Jul 15 08:51:18 vsanqa7 corosync[12081]: [CMAN ] quorum regained,
resuming activity
Post by K Mehta
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] This node is within
the primary component and will provide service.
Post by K Mehta
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] Members[2]: 1 2
Membership 4035520: quorum acquired
cman_event_callback: Node vsanqa8[2] - state is now member
sender r(0) ip(172.16.68.120) ; members(old:1 left:0)
Post by K Mehta
Jul 15 08:51:18 vsanqa7 corosync[12081]: [MAIN ] Completed service
synchronization, ready to provide service.
Post by K Mehta
Jul 15 08:51:30 vsanqa7 crmd[12372]: warning: match_down_event: No
match for shutdown action on vsanqa8
State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN
cause=C_FSA_INTERNAL origin=peer_update_callback ]
Post by K Mehta
Jul 15 08:51:30 vsanqa7 cib[12367]: warning: cib_process_diff: Diff
0.2760.0 -> 0.2760.1 from vsanqa8 not applied to 0.2760.307: current
"num_updates" is greater than required
Post by K Mehta
Jul 15 08:51:31 vsanqa7 kernel: send_and_wait_for_client_info failed
with -110 uuid=0x74
Sending full refresh (origin=crmd)
master-vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839 (8)
Sending flush op to all hosts for: probe_complete (true)
Post by K Mehta
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Post by K Mehta
Why vsanqa8 is scheduled for shutdown ? <<<<<
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: stage6: Scheduling
Node vsanqa8 for shutdown
Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-3530.bz2
Post by K Mehta
Node vsanqa8
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Corosync Cluster
Engine ('1.4.1'): started and ready to provide service.
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Corosync built-in
features: nss dbus rdma snmp
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Successfully read
config from /etc/cluster/cluster.conf
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Successfully parsed
cman config
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] Initializing
transport (UDP/IP Multicast).
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] The network
interface [172.16.68.126] is now up.
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Using quorum
provider quorum_cman
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine
loaded: corosync cluster quorum service v0.1
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CMAN ] CMAN 3.0.12.1 (built
Feb 22 2013 07:20:27) started
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine
loaded: corosync CMAN membership service 2.90
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine
loaded: openais checkpoint service B.01.01
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine
loaded: corosync extended virtual synchrony service
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine
loaded: corosync configuration service
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine
loaded: corosync cluster closed process group service v1.01
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine
loaded: corosync cluster config database access v1.01
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine
loaded: corosync profile loading service
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Using quorum
provider quorum_cman
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine
loaded: corosync cluster quorum service v0.1
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Compatibility mode
set to whitetank. Using V1 and V2 of the synchronization engine.
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] A processor joined
or left the membership and a new membership was formed.
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[1]: 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[1]: 2
sender r(0) ip(172.16.68.126) ; members(old:0 left:0)
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Completed service
synchronization, ready to provide service.
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] A processor joined
or left the membership and a new membership was formed.
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CMAN ] quorum regained,
resuming activity
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] This node is within
the primary component and will provide service.
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[2]: 1 2
sender r(0) ip(172.16.68.120) ; members(old:1 left:0)
Post by K Mehta
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Completed service
synchronization, ready to provide service.
Post by K Mehta
Jul 15 08:51:22 vsanqa8 fenced[21447]: fenced 3.0.12.1 started
Jul 15 08:51:22 vsanqa8 dlm_controld[21467]: dlm_controld 3.0.12.1
started
Post by K Mehta
Jul 15 08:51:23 vsanqa8 gfs_controld[21522]: gfs_controld 3.0.12.1
started
Additional logging available in /var/log/cluster/corosync.log
Post by K Mehta
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: main: Starting
Pacemaker 1.1.8-7.el6 (Build: 394e906): generated-manpages agent-manpages
ascii-docs publican-docs ncurses libqb-logging libqb-ipc corosync-plugin
cman
Additional logging available in /var/log/cluster/corosync.log
Additional logging available in /var/log/cluster/corosync.log
Additional logging available in /var/log/cluster/corosync.log
Additional logging available in /var/log/cluster/corosync.log
crm_cluster_connect: Connecting to cluster infrastructure: cman
Post by K Mehta
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: main: Using legacy config
location: /var/lib/heartbeat/crm
Connecting to cluster infrastructure: cman
Additional logging available in /var/log/cluster/corosync.log
394e906
Additional logging available in /var/log/cluster/corosync.log
Post by K Mehta
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: main: Starting
mainloop...
Connecting to cluster infrastructure: cman
Connecting to cluster infrastructure: cman
Post by K Mehta
Jul 15 08:51:30 vsanqa8 stonith-ng[21680]: notice: setup_cib: Watching
for stonith topology changes
Membership 4035520: quorum acquired
cman_event_callback: Node vsanqa7[1] - state is now member
cman_event_callback: Node vsanqa8[2] - state is now member
Post by K Mehta
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: do_started: The local CRM
is operational
State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
origin=do_cl_join_finalize_respond ]
Sending full refresh (origin=crmd)
Post by K Mehta
vsanqa8 is not expecting a shutdown request from vsanqa7 <<<<<<<
Jul 15 08:51:32 vsanqa8 crmd[21684]: error: handle_request: We didn't
ask to be shut down, yet our DC is telling us too.
State transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE
origin=route_message ]
Post by K Mehta
Jul 15 08:51:33 vsanqa8 crmd[21684]: notice: process_lrm_event: LRM
operation vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839_monitor_0 (call=6, rc=7,
cib-update=9, confirmed=true) not running
Disconnecting from Corosync
new_event_notification (21679-21684-12): Broken pipe (32)
Post by K Mehta
Jul 15 08:51:33 vsanqa8 crmd[21684]: warning: do_exit: Inhibiting
respawn by Heartbeat
Post by K Mehta
Jul 15 08:51:33 vsanqa8 cib[21679]: warning: do_local_notify: A-Sync
reply to crmd failed: No message of desired type
Child process crmd exited (pid=21684, rc=100)
Pacemaker child process crmd no longer wishes to be respawned. Shutting
ourselves down.
pcmk_shutdown_worker: Shuting down Pacemaker
Stopping pengine: Sent -15 to process 21683
Stopping attrd: Sent -15 to process 21682
Post by K Mehta
Jul 15 08:51:33 vsanqa8 attrd[21682]: notice: main: Exiting...
Stopping lrmd: Sent -15 to process 21681
Stopping stonith-ng: Sent -15 to process 21680
=========================corosync.log===============================================
Post by K Mehta
vsanqa7
pcmk_cpg_membership: Member[3.1] cib.2
crm_update_peer_proc: pcmk_cpg_membership: Node vsanqa8[2] -
corosync-cpg is now online
pcmk_cpg_membership: Joined[3.0] crmd.2
pcmk_cpg_membership: Member[3.0] crmd.1
pcmk_cpg_membership: Member[3.1] crmd.2
crm_update_peer_proc: pcmk_cpg_membership: Node vsanqa8[2] -
corosync-cpg is now online
peer_update_callback: Client vsanqa8/peer now has status [online]
(DC=true)
No match for shutdown action on vsanqa8
do_state_transition: State transition S_IDLE -> S_INTEGRATION [
input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ]
abort_transition_graph: do_te_invoke:163 - Triggered transition abort
(complete=1) : Peer Halt
Making join offers based on membership 4035520
do_dc_join_offer_all: join-2: Waiting on 2 outstanding join acks
Set DC to vsanqa7 (3.0.7)
current "num_updates" is greater than required
cib_process_request: Operation complete: op cib_sync_one for section
'all' (origin=vsanqa8/vsanqa8/(null), version=0.2760.307): OK (rc=0)
do_dc_join_offer_all: A new node joined the cluster
do_dc_join_offer_all: join-3: Waiting on 2 outstanding join acks
Set DC to vsanqa7 (3.0.7)
crm_update_peer_expected: do_dc_join_filter_offer: Node vsanqa8[2]
- expected state is now member
do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN
[ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
do_dc_join_finalize: join-3: Syncing the CIB from vsanqa7 to the rest
of the cluster
cib_process_request: Operation complete: op cib_sync for section 'all'
(origin=local/crmd/43, version=0.2760.307): OK (rc=0)
cib_process_request: Operation complete: op cib_modify for section
nodes (origin=local/crmd/44, version=0.2760.308): OK (rc=0)
join-3: Updating node state to member for vsanqa7
join-3: Updating node state to member for vsanqa8
cib_process_request: Operation complete: op cib_modify for section
nodes (origin=local/crmd/45, version=0.2760.309): OK (rc=0)
cib_process_request: Operation complete: op cib_delete for section
version=0.2760.310): OK (rc=0)
cib_process_request: Operation complete: op cib_delete for section
version=0.2760.312): OK (rc=0)
do_state_transition: State transition S_FINALIZE_JOIN ->
S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL
origin=check_join_state ]
abort_transition_graph: do_te_invoke:156 - Triggered transition abort
(complete=1) : Peer Cancelled
attrd_local_callback: Sending full refresh (origin=crmd)
Post by K Mehta
vsanqa8
current "num_updates" is less than required
cib_server_process_diff: Requesting re-sync from peer
Delaying start, Config not read (0000000000000040)
Delaying start, Config not read (0000000000000040)
qb_ipcs_us_publish: server name: crmd
The local CRM is operational
do_state_transition: State transition S_STARTING -> S_PENDING [
input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
0936115228a36c943f181954830c9b2b
cib_process_replace: Replaced 0.2760.1 with 0.2760.307 from vsanqa7
cib_replace_notify: Replaced: 0.2760.1 -> 0.2760.307 from vsanqa7
pcmk_cpg_membership: Joined[0.0] crmd.2
pcmk_cpg_membership: Member[0.0] crmd.1
crm_update_peer_proc: pcmk_cpg_membership: Node vsanqa7[1] -
corosync-cpg is now online
peer_update_callback: Client vsanqa7/peer now has status [online]
(DC=<null>)
pcmk_cpg_membership: Member[0.1] crmd.2
Set DC to vsanqa7 (3.0.7)
Processed register from crmd.21684: OK (0)
Processed st_notify from crmd.21684: OK (0)
Processed st_notify from crmd.21684: OK (0)
Connecting to attrd... 5 retries remaining
do_state_transition: State transition S_PENDING -> S_NOT_DC [
input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
0936115228a36c943f181954830c9b2b
cib_process_replace: Replaced 0.2760.307 with 0.2760.307 from vsanqa7
attrd_local_callback: Sending full refresh (origin=crmd)
process_lrmd_get_rsc_info: Resource
'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839' not found (0 active resources)
process_lrmd_get_rsc_info: Resource
'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839:0' not found (0 active resources)
process_lrmd_rsc_register: Added
'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839' to the rsc list (1 active
resources)
We didn't ask to be shut down, yet our DC is telling us too.
do_state_transition: State transition S_NOT_DC -> S_STOPPING [
input=I_STOP cause=C_HA_MESSAGE origin=route_message ]
Disconnecting STONITH...
tengine_stonith_connection_destroy: Fencing daemon disconnected
1 pending LRM operations at shutdown... waiting
vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839:6
(vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839_monitor_0)
Post by K Mehta
Regards,
Kiran
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20130717/4531caac/attachment-0001.html>
Andrew Beekhof
2013-07-18 05:54:22 UTC
Permalink
Post by K Mehta
Andrew,
Thanks to your reply. Here is the detail
[root at vsanqa8 ~]# rpm -qa | grep pacemaker
pacemaker-cluster-libs-1.1.8-7.el6.x86_64
pacemaker-1.1.8-7.el6.x86_64
pacemaker-libs-1.1.8-7.el6.x86_64
pacemaker-cli-1.1.8-7.el6.x86_64
[root at vsanqa8 ~]# rpm -qa | grep coro
corosynclib-1.4.1-15.el6_4.1.x86_64
corosync-1.4.1-15.el6_4.1.x86_64
[root at vsanqa8 ~]# rpm -qa | grep cman
cman-3.0.12.1-49.el6.x86_64
[root at vsanqa8 ~]# uname -a
Linux vsanqa8 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root at vsanqa8 ~]# cat /etc/redhat-release
CentOS release 6.2 (Final)
[root at vsanqa8 ~]#
so do I need to upgrade to pacemaker 1.1.9 for the fix ?
yes. sorry.
there are packages for 1.1.9 on clusterlabs.org for EL6
Post by K Mehta
Regards,
Kiran
I have a two node test cluster running with CMAN plugin. Fencing is not configured.
Thats problem 1
I see that vsanqa7 sends a message to vsanqa8 to shutdown.
However, it is not clear why vsanqa7 takes this decision.
https://github.com/beekhof/pacemaker/commit/913a850
Is this on RHEL? or one of the rebuilds?
===================/var/log/messages=================================
Node vsanqa7
Jul 15 08:51:18 vsanqa7 corosync[12081]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa7 corosync[12081]: [CMAN ] quorum regained, resuming activity
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] This node is within the primary component and will provide service.
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa7 corosync[12081]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa7 crmd[12372]: notice: cman_event_callback: Membership 4035520: quorum acquired
Jul 15 08:51:18 vsanqa7 crmd[12372]: notice: crm_update_peer_state: cman_event_callback: Node vsanqa8[2] - state is now member
Jul 15 08:51:18 vsanqa7 corosync[12081]: [CPG ] chosen downlist: sender r(0) ip(172.16.68.120) ; members(old:1 left:0)
Jul 15 08:51:18 vsanqa7 corosync[12081]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 15 08:51:30 vsanqa7 crmd[12372]: warning: match_down_event: No match for shutdown action on vsanqa8
Jul 15 08:51:30 vsanqa7 crmd[12372]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ]
Jul 15 08:51:30 vsanqa7 cib[12367]: warning: cib_process_diff: Diff 0.2760.0 -> 0.2760.1 from vsanqa8 not applied to 0.2760.307: current "num_updates" is greater than required
Jul 15 08:51:31 vsanqa7 kernel: send_and_wait_for_client_info failed with -110 uuid=0x74
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839 (8)
Jul 15 08:51:32 vsanqa7 attrd[12370]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: unpack_config: On loss of CCM Quorum: Ignore
Why vsanqa8 is scheduled for shutdown ? <<<<<
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: stage6: Scheduling Node vsanqa8 for shutdown
Jul 15 08:51:32 vsanqa7 pengine[12371]: notice: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-3530.bz2
Node vsanqa8
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Corosync built-in features: nss dbus rdma snmp
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Successfully parsed cman config
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] The network interface [172.16.68.126] is now up.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Using quorum provider quorum_cman
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CMAN ] CMAN 3.0.12.1 (built Feb 22 2013 07:20:27) started
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: openais checkpoint service B.01.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync configuration service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync profile loading service
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Using quorum provider quorum_cman
Jul 15 08:51:18 vsanqa8 corosync[21392]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[1]: 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[1]: 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CPG ] chosen downlist: sender r(0) ip(172.16.68.126) ; members(old:0 left:0)
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CMAN ] quorum regained, resuming activity
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] This node is within the primary component and will provide service.
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [QUORUM] Members[2]: 1 2
Jul 15 08:51:18 vsanqa8 corosync[21392]: [CPG ] chosen downlist: sender r(0) ip(172.16.68.120) ; members(old:1 left:0)
Jul 15 08:51:18 vsanqa8 corosync[21392]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 15 08:51:22 vsanqa8 fenced[21447]: fenced 3.0.12.1 started
Jul 15 08:51:22 vsanqa8 dlm_controld[21467]: dlm_controld 3.0.12.1 started
Jul 15 08:51:23 vsanqa8 gfs_controld[21522]: gfs_controld 3.0.12.1 started
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 pacemakerd[21673]: notice: main: Starting Pacemaker 1.1.8-7.el6 (Build: 394e906): generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc corosync-plugin cman
Jul 15 08:51:29 vsanqa8 lrmd[21681]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 stonith-ng[21680]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 stonith-ng[21680]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: main: Using legacy config location: /var/lib/heartbeat/crm
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman
Jul 15 08:51:29 vsanqa8 crmd[21684]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 crmd[21684]: notice: main: CRM Git Version: 394e906
Jul 15 08:51:29 vsanqa8 pengine[21683]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Jul 15 08:51:29 vsanqa8 attrd[21682]: notice: main: Starting mainloop...
Jul 15 08:51:29 vsanqa8 cib[21679]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman
Jul 15 08:51:30 vsanqa8 stonith-ng[21680]: notice: setup_cib: Watching for stonith topology changes
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: cman_event_callback: Membership 4035520: quorum acquired
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_update_peer_state: cman_event_callback: Node vsanqa7[1] - state is now member
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: crm_update_peer_state: cman_event_callback: Node vsanqa8[2] - state is now member
Jul 15 08:51:30 vsanqa8 crmd[21684]: notice: do_started: The local CRM is operational
Jul 15 08:51:32 vsanqa8 crmd[21684]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
Jul 15 08:51:32 vsanqa8 attrd[21682]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
vsanqa8 is not expecting a shutdown request from vsanqa7 <<<<<<<
Jul 15 08:51:32 vsanqa8 crmd[21684]: error: handle_request: We didn't ask to be shut down, yet our DC is telling us too.
Jul 15 08:51:32 vsanqa8 crmd[21684]: notice: do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE origin=route_message ]
Jul 15 08:51:33 vsanqa8 crmd[21684]: notice: process_lrm_event: LRM operation vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839_monitor_0 (call=6, rc=7, cib-update=9, confirmed=true) not running
Jul 15 08:51:33 vsanqa8 crmd[21684]: notice: terminate_cs_connection: Disconnecting from Corosync
Jul 15 08:51:33 vsanqa8 cib[21679]: warning: qb_ipcs_event_sendv: new_event_notification (21679-21684-12): Broken pipe (32)
Jul 15 08:51:33 vsanqa8 crmd[21684]: warning: do_exit: Inhibiting respawn by Heartbeat
Jul 15 08:51:33 vsanqa8 cib[21679]: warning: do_local_notify: A-Sync reply to crmd failed: No message of desired type
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: error: pcmk_child_exit: Child process crmd exited (pid=21684, rc=100)
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: warning: pcmk_child_exit: Pacemaker child process crmd no longer wishes to be respawned. Shutting ourselves down.
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: pcmk_shutdown_worker: Shuting down Pacemaker
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping pengine: Sent -15 to process 21683
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping attrd: Sent -15 to process 21682
Jul 15 08:51:33 vsanqa8 attrd[21682]: notice: main: Exiting...
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping lrmd: Sent -15 to process 21681
Jul 15 08:51:33 vsanqa8 pacemakerd[21673]: notice: stop_child: Stopping stonith-ng: Sent -15 to process 21680
=========================corosync.log===============================================
vsanqa7
Jul 15 08:51:29 [12367] vsanqa7 cib: info: pcmk_cpg_membership: Member[3.1] cib.2
Jul 15 08:51:29 [12367] vsanqa7 cib: info: crm_update_peer_proc: pcmk_cpg_membership: Node vsanqa8[2] - corosync-cpg is now online
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership: Joined[3.0] crmd.2
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership: Member[3.0] crmd.1
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: pcmk_cpg_membership: Member[3.1] crmd.2
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node vsanqa8[2] - corosync-cpg is now online
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: peer_update_callback: Client vsanqa8/peer now has status [online] (DC=true)
Jul 15 08:51:30 [12372] vsanqa7 crmd: warning: match_down_event: No match for shutdown action on vsanqa8
Jul 15 08:51:30 [12372] vsanqa7 crmd: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ]
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: abort_transition_graph: do_te_invoke:163 - Triggered transition abort (complete=1) : Peer Halt
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: join_make_offer: Making join offers based on membership 4035520
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: do_dc_join_offer_all: join-2: Waiting on 2 outstanding join acks
Jul 15 08:51:30 [12372] vsanqa7 crmd: info: update_dc: Set DC to vsanqa7 (3.0.7)
Jul 15 08:51:30 [12367] vsanqa7 cib: warning: cib_process_diff: Diff 0.2760.0 -> 0.2760.1 from vsanqa8 not applied to 0.2760.307: current "num_updates" is greater than required
Jul 15 08:51:30 [12367] vsanqa7 cib: info: cib_process_request: Operation complete: op cib_sync_one for section 'all' (origin=vsanqa8/vsanqa8/(null), version=0.2760.307): OK (rc=0)
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: do_dc_join_offer_all: A new node joined the cluster
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: do_dc_join_offer_all: join-3: Waiting on 2 outstanding join acks
Jul 15 08:51:31 [12372] vsanqa7 crmd: info: update_dc: Set DC to vsanqa7 (3.0.7)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: crm_update_peer_expected: do_dc_join_filter_offer: Node vsanqa8[2] - expected state is now member
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_finalize: join-3: Syncing the CIB from vsanqa7 to the rest of the cluster
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/43, version=0.2760.307): OK (rc=0)
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/44, version=0.2760.308): OK (rc=0)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_ack: join-3: Updating node state to member for vsanqa7
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_dc_join_ack: join-3: Updating node state to member for vsanqa8
Jul 15 08:51:32 [12367] vsanqa7 cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/45, version=0.2760.309): OK (rc=0)
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
Jul 15 08:51:32 [12372] vsanqa7 crmd: info: abort_transition_graph: do_te_invoke:156 - Triggered transition abort (complete=1) : Peer Cancelled
Jul 15 08:51:32 [12370] vsanqa7 attrd: notice: attrd_local_callback: Sending full refresh (origin=crmd)
vsanqa8
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_diff: Diff 0.2760.306 -> 0.2760.307 from vsanqa7 not applied to 0.2760.1: current "num_updates" is less than required
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_server_process_diff: Requesting re-sync from peer
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_started: Delaying start, Config not read (0000000000000040)
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_started: Delaying start, Config not read (0000000000000040)
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: qb_ipcs_us_publish: server name: crmd
Jul 15 08:51:30 [21684] vsanqa8 crmd: notice: do_started: The local CRM is operational
Jul 15 08:51:30 [21684] vsanqa8 crmd: info: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_replace: Digest matched on replace from vsanqa7: 0936115228a36c943f181954830c9b2b
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_process_replace: Replaced 0.2760.1 with 0.2760.307 from vsanqa7
Jul 15 08:51:30 [21679] vsanqa8 cib: info: cib_replace_notify: Replaced: 0.2760.1 -> 0.2760.307 from vsanqa7
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership: Joined[0.0] crmd.2
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership: Member[0.0] crmd.1
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node vsanqa7[1] - corosync-cpg is now online
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: peer_update_callback: Client vsanqa7/peer now has status [online] (DC=<null>)
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: pcmk_cpg_membership: Member[0.1] crmd.2
Jul 15 08:51:31 [21684] vsanqa8 crmd: info: update_dc: Set DC to vsanqa7 (3.0.7)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command: Processed register from crmd.21684: OK (0)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command: Processed st_notify from crmd.21684: OK (0)
Jul 15 08:51:32 [21680] vsanqa8 stonith-ng: info: stonith_command: Processed st_notify from crmd.21684: OK (0)
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: update_attrd: Connecting to attrd... 5 retries remaining
Jul 15 08:51:32 [21684] vsanqa8 crmd: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
Jul 15 08:51:32 [21679] vsanqa8 cib: info: cib_process_replace: Digest matched on replace from vsanqa7: 0936115228a36c943f181954830c9b2b
Jul 15 08:51:32 [21679] vsanqa8 cib: info: cib_process_replace: Replaced 0.2760.307 with 0.2760.307 from vsanqa7
Jul 15 08:51:32 [21682] vsanqa8 attrd: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info: process_lrmd_get_rsc_info: Resource 'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839' not found (0 active resources)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info: process_lrmd_get_rsc_info: Resource 'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839:0' not found (0 active resources)
Jul 15 08:51:32 [21681] vsanqa8 lrmd: info: process_lrmd_rsc_register: Added 'vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839' to the rsc list (1 active resources)
Jul 15 08:51:32 [21684] vsanqa8 crmd: error: handle_request: We didn't ask to be shut down, yet our DC is telling us too.
Jul 15 08:51:32 [21684] vsanqa8 crmd: notice: do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE origin=route_message ]
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: do_shutdown: Disconnecting STONITH...
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: verify_stopped: 1 pending LRM operations at shutdown... waiting
Jul 15 08:51:32 [21684] vsanqa8 crmd: info: ghash_print_pending: Pending action: vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839:6 (vha-7413ed6d-2a3b-4ffc-9cd0-b80778d7a839_monitor_0)
Regards,
Kiran
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Digimer
2013-07-17 03:45:32 UTC
Permalink
Post by K Mehta
I have a two node test cluster running with CMAN plugin. Fencing is not
configured.
Andrew spoke to the pacemaker stuff. I want to reiterate what he said on
fencing though. Fencing *must* be used with cman. If fenced calls a
fence, nothing else will happen until the fence succeeds. This is by
design to avoid split-brains.

If you need help configuring stonith in pacemaker or using the
fence_pcmk fence agent hook in cluster.conf, please ask.
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
K Mehta
2013-07-17 08:24:29 UTC
Permalink
Thanks Digimer. I will make sure that I configure fencing along with CMAN.
I have multiple independent master slave resources in the cluster.
A node could be master for few and slave for other resources.
I need to make sure that fencing occurs at resource level
(no klling of nodes) rather than at node level.I hope it is ok to embed
logic of selective blocking of resources in fencing agent.
Post by Digimer
Post by K Mehta
I have a two node test cluster running with CMAN plugin. Fencing is not
configured.
Andrew spoke to the pacemaker stuff. I want to reiterate what he said on
fencing though. Fencing *must* be used with cman. If fenced calls a fence,
nothing else will happen until the fence succeeds. This is by design to
avoid split-brains.
If you need help configuring stonith in pacemaker or using the fence_pcmk
fence agent hook in cluster.conf, please ask.
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20130717/3de52337/attachment.html>
Digimer
2013-07-17 15:10:41 UTC
Permalink
This isn't how fencing works. If a node enters an unknown state (stops
responding to the other node(s)), it must be put into a known state.
This mechanism must work without any input from the target node. If you
try to do "resource level" fencing, you are assuming that the node can
respond.

So the only two options you have are "power fencing" and "fabric
fencing". The first forcibly powers off the node and the later severs
the node's connection to the network. In either case, the node is taken
offline.

Anything that leaves a node up and accessible, to any degree, is not a
useful fence mechanism.

digimer
Post by K Mehta
Thanks Digimer. I will make sure that I configure fencing along with CMAN.
I have multiple independent master slave resources in the cluster.
A node could be master for few and slave for other resources.
I need to make sure that fencing occurs at resource level
(no klling of nodes) rather than at node level.I hope it is ok to embed
logic of selective blocking of resources in fencing agent.
On Wed, Jul 17, 2013 at 9:15 AM, Digimer <lists at alteeve.ca
I have a two node test cluster running with CMAN plugin. Fencing is not
configured.
Andrew spoke to the pacemaker stuff. I want to reiterate what he
said on fencing though. Fencing *must* be used with cman. If fenced
calls a fence, nothing else will happen until the fence succeeds.
This is by design to avoid split-brains.
If you need help configuring stonith in pacemaker or using the
fence_pcmk fence agent hook in cluster.conf, please ask.
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person
without access to education?
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
Continue reading on narkive:
Loading...