Discussion:
[Pacemaker] Trouble with drbd/pacemaker: switch to secondary/secondary
Anne Nicolas
2016-10-14 15:54:45 UTC
Permalink
Hi!

I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba
and some other services.

Whatever I do, it always goes to the following state:

Last updated: Fri Oct 14 17:41:38 2016
Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr
Stack: corosync
Current DC: bzvairsvr (168430081) - partition with quorum
Version: 1.1.8-9.mga5-394e906
2 Nodes configured, unknown expected votes
13 Resources configured.


Online: [ bzvairsvr bzvairsvr2 ]

Master/Slave Set: drbdservClone [drbdserv]
Slaves: [ bzvairsvr bzvairsvr2 ]
Clone Set: fencing [st-ssh]
Started: [ bzvairsvr bzvairsvr2 ]

When I reboot bzvairsvr2 this one goes primary again. But after a while
becomes secondary also.
I use a very basic fencing system based on ssh. It's not optimal but
enough for the current tests.

Here are information about the configuration:

node 168430081: bzvairsvr
node 168430082: bzvairsvr2
primitive apache apache \
params configfile="/etc/httpd/conf/httpd.conf" \
op start interval=0 timeout=120s \
op stop interval=0 timeout=120s
primitive clusterip IPaddr2 \
params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \
meta target-role=Started
primitive clusterroute Route \
params destination="0.0.0.0/0" gateway=192.168.100.254
primitive drbdserv ocf:linbit:drbd \
params drbd_resource=server \
op monitor interval=30s role=Slave \
op monitor interval=29s role=Master start-delay=30s
primitive fsserv Filesystem \
params device="/dev/drbd/by-res/server" directory="/Server"
fstype=ext4 \
op start interval=0 timeout=60s \
op stop interval=0 timeout=60s \
meta target-role=Started
primitive libvirt-guests systemd:libvirt-guests
primitive libvirtd systemd:libvirtd
primitive mysql systemd:mysqld
primitive named systemd:named
primitive samba systemd:smb
primitive st-ssh stonith:external/ssh \
params hostlist="bzvairsvr bzvairsvr2"
group iphd clusterip clusterroute \
meta target-role=Started
group services libvirtd libvirt-guests apache named mysql samba \
meta target-role=Started
ms drbdservClone drbdserv \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true target-role=Started
clone fencing st-ssh
colocation fs_on_drbd inf: fsserv drbdservClone:Master
colocation iphd_on_services inf: iphd services
colocation services_on_fsserv inf: services fsserv
order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start
order services_after_fsserv inf: fsserv services
property cib-bootstrap-options: \
dc-version=1.1.8-9.mga5-394e906 \
cluster-infrastructure=corosync \
no-quorum-policy=ignore \
stonith-enabled=true \

cluster logs are flooded by :
Oct 14 17:42:28 [3445] bzvairsvr attrd: notice:
attrd_trigger_update: Sending flush op to all hosts for:
master-drbdserv (10000)
Oct 14 17:42:28 [3445] bzvairsvr attrd: notice:
attrd_perform_update: Sent update master-drbdserv=10000 failed:
Transport endpoint is not connected
Oct 14 17:42:28 [3445] bzvairsvr attrd: notice:
attrd_perform_update: Sent update -107: master-drbdserv=10000
Oct 14 17:42:28 [3445] bzvairsvr attrd: warning:
attrd_cib_callback: Update master-drbdserv=10000 failed: Transport
endpoint is not connected
Oct 14 17:42:59 [3445] bzvairsvr attrd: notice:
attrd_trigger_update: Sending flush op to all hosts for:
master-drbdserv (10000)
Oct 14 17:42:59 [3445] bzvairsvr attrd: notice:
attrd_perform_update: Sent update master-drbdserv=10000 failed:
Transport endpoint is not connected
Oct 14 17:42:59 [3445] bzvairsvr attrd: notice:
attrd_perform_update: Sent update -107: master-drbdserv=10000
Oct 14 17:42:59 [3445] bzvairsvr attrd: warning:
attrd_cib_callback: Update master-drbdserv=10000 failed: Transport
endpoint is not connected


And here is dmesg

[34067.547147] block drbd0: peer( Secondary -> Primary )
[34091.023206] block drbd0: peer( Primary -> Secondary )
[34096.616319] drbd server: peer( Secondary -> Unknown ) conn( Connected
-> TearDown ) pdsk( UpToDate -> DUnknown )
[34096.616353] drbd server: asender terminated
[34096.616358] drbd server: Terminating drbd_a_server
[34096.682874] drbd server: Connection closed
[34096.682894] drbd server: conn( TearDown -> Unconnected )
[34096.682897] drbd server: receiver terminated
[34096.682900] drbd server: Restarting receiver thread
[34096.682902] drbd server: receiver (re)started
[34096.682915] drbd server: conn( Unconnected -> WFConnection )
[34103.311898] drbd server: Handshake successful: Agreed network
protocol version 101
[34103.311903] drbd server: Agreed to support TRIM on protocol level
[34103.311997] drbd server: Peer authenticated using 20 bytes HMAC
[34103.312046] drbd server: conn( WFConnection -> WFReportParams )
[34103.312062] drbd server: Starting asender thread (from drbd_r_server
[4344])
[34103.380311] block drbd0: drbd_sync_handshake:
[34103.380318] block drbd0: self
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34103.380323] block drbd0: peer
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34103.380327] block drbd0: uuid_compare()=0 by rule 40
[34103.380335] block drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
[34114.046443] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
[34123.802580] drbd server: PingAck did not arrive in time.
[34123.802617] drbd server: peer( Secondary -> Unknown ) conn( Connected
-> NetworkFailure ) pdsk( UpToDate -> DUnknown )
[34123.802773] drbd server: asender terminated
[34123.802777] drbd server: Terminating drbd_a_server
[34123.932565] drbd server: Connection closed
[34123.932585] drbd server: conn( NetworkFailure -> Unconnected )
[34123.932588] drbd server: receiver terminated
[34123.932590] drbd server: Restarting receiver thread
[34123.932592] drbd server: receiver (re)started
[34123.932605] drbd server: conn( Unconnected -> WFConnection )
[34185.719207] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
full duplex, Flow control: ON - receive & transmit
[34232.241599] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
[34268.637861] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
full duplex, Flow control: ON - receive & transmit
[34318.675122] drbd server: Handshake successful: Agreed network
protocol version 101
[34318.675128] drbd server: Agreed to support TRIM on protocol level
[34318.675218] drbd server: Peer authenticated using 20 bytes HMAC
[34318.675258] drbd server: conn( WFConnection -> WFReportParams )
[34318.675276] drbd server: Starting asender thread (from drbd_r_server
[4344])
[34318.738909] block drbd0: drbd_sync_handshake:
[34318.738916] block drbd0: self
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34318.738921] block drbd0: peer
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34318.738924] block drbd0: uuid_compare()=0 by rule 40
[34318.738933] block drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
[34328.812317] block drbd0: peer( Secondary -> Primary )
[37316.065793] usb 3-11: USB disconnect, device number 3
[52246.642265] block drbd0: peer( Primary -> Secondary )

Any help would be appreciated

Cheers
--
Anne Nicolas
http://mageia.org

_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Anne Nicolas
2016-10-17 14:29:18 UTC
Permalink
Post by Anne Nicolas
attrd_cib_callback: Update master-drbdserv=10000 failed: Transport
endpoint is not connected
Hi Anne,
Wild guess: One or more ports is being blocked on at least one of the
nodes, probably by a firewall.
TCP ports 2224, 3121, and 21064, and UDP port 5405.
Well to make things easier, this test platform does not have any active
firewall :/
Cheers,
Kristoffer
--
Anne Nicolas
http://mageia.org

_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_fro
Vlad
2016-10-18 07:56:47 UTC
Permalink
Is something wrong with the network interface?

[34114.046443] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
[34185.719207] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
full duplex, Flow control: ON - receive & transmit
[34232.241599] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
[34268.637861] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
full duplex, Flow control: ON - receive & transmit
Post by Anne Nicolas
Hi!
I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba
and some other services.
Last updated: Fri Oct 14 17:41:38 2016
Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr
Stack: corosync
Current DC: bzvairsvr (168430081) - partition with quorum
Version: 1.1.8-9.mga5-394e906
2 Nodes configured, unknown expected votes
13 Resources configured.
Online: [ bzvairsvr bzvairsvr2 ]
Master/Slave Set: drbdservClone [drbdserv]
Slaves: [ bzvairsvr bzvairsvr2 ]
Clone Set: fencing [st-ssh]
Started: [ bzvairsvr bzvairsvr2 ]
When I reboot bzvairsvr2 this one goes primary again. But after a while
becomes secondary also.
I use a very basic fencing system based on ssh. It's not optimal but
enough for the current tests.
node 168430081: bzvairsvr
node 168430082: bzvairsvr2
primitive apache apache \
params configfile="/etc/httpd/conf/httpd.conf" \
op start interval=0 timeout=120s \
op stop interval=0 timeout=120s
primitive clusterip IPaddr2 \
params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \
meta target-role=Started
primitive clusterroute Route \
params destination="0.0.0.0/0" gateway=192.168.100.254
primitive drbdserv ocf:linbit:drbd \
params drbd_resource=server \
op monitor interval=30s role=Slave \
op monitor interval=29s role=Master start-delay=30s
primitive fsserv Filesystem \
params device="/dev/drbd/by-res/server" directory="/Server"
fstype=ext4 \
op start interval=0 timeout=60s \
op stop interval=0 timeout=60s \
meta target-role=Started
primitive libvirt-guests systemd:libvirt-guests
primitive libvirtd systemd:libvirtd
primitive mysql systemd:mysqld
primitive named systemd:named
primitive samba systemd:smb
primitive st-ssh stonith:external/ssh \
params hostlist="bzvairsvr bzvairsvr2"
group iphd clusterip clusterroute \
meta target-role=Started
group services libvirtd libvirt-guests apache named mysql samba \
meta target-role=Started
ms drbdservClone drbdserv \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true target-role=Started
clone fencing st-ssh
colocation fs_on_drbd inf: fsserv drbdservClone:Master
colocation iphd_on_services inf: iphd services
colocation services_on_fsserv inf: services fsserv
order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start
order services_after_fsserv inf: fsserv services
property cib-bootstrap-options: \
dc-version=1.1.8-9.mga5-394e906 \
cluster-infrastructure=corosync \
no-quorum-policy=ignore \
stonith-enabled=true \
master-drbdserv (10000)
Transport endpoint is not connected
attrd_perform_update: Sent update -107: master-drbdserv=10000
attrd_cib_callback: Update master-drbdserv=10000 failed: Transport
endpoint is not connected
master-drbdserv (10000)
Transport endpoint is not connected
attrd_perform_update: Sent update -107: master-drbdserv=10000
attrd_cib_callback: Update master-drbdserv=10000 failed: Transport
endpoint is not connected
And here is dmesg
[34067.547147] block drbd0: peer( Secondary -> Primary )
[34091.023206] block drbd0: peer( Primary -> Secondary )
[34096.616319] drbd server: peer( Secondary -> Unknown ) conn( Connected
-> TearDown ) pdsk( UpToDate -> DUnknown )
[34096.616353] drbd server: asender terminated
[34096.616358] drbd server: Terminating drbd_a_server
[34096.682874] drbd server: Connection closed
[34096.682894] drbd server: conn( TearDown -> Unconnected )
[34096.682897] drbd server: receiver terminated
[34096.682900] drbd server: Restarting receiver thread
[34096.682902] drbd server: receiver (re)started
[34096.682915] drbd server: conn( Unconnected -> WFConnection )
[34103.311898] drbd server: Handshake successful: Agreed network
protocol version 101
[34103.311903] drbd server: Agreed to support TRIM on protocol level
[34103.311997] drbd server: Peer authenticated using 20 bytes HMAC
[34103.312046] drbd server: conn( WFConnection -> WFReportParams )
[34103.312062] drbd server: Starting asender thread (from drbd_r_server
[4344])
[34103.380318] block drbd0: self
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34103.380323] block drbd0: peer
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34103.380327] block drbd0: uuid_compare()=0 by rule 40
[34103.380335] block drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
[34114.046443] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
[34123.802580] drbd server: PingAck did not arrive in time.
[34123.802617] drbd server: peer( Secondary -> Unknown ) conn( Connected
-> NetworkFailure ) pdsk( UpToDate -> DUnknown )
[34123.802773] drbd server: asender terminated
[34123.802777] drbd server: Terminating drbd_a_server
[34123.932565] drbd server: Connection closed
[34123.932585] drbd server: conn( NetworkFailure -> Unconnected )
[34123.932588] drbd server: receiver terminated
[34123.932590] drbd server: Restarting receiver thread
[34123.932592] drbd server: receiver (re)started
[34123.932605] drbd server: conn( Unconnected -> WFConnection )
[34185.719207] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
full duplex, Flow control: ON - receive & transmit
[34232.241599] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
[34268.637861] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
full duplex, Flow control: ON - receive & transmit
[34318.675122] drbd server: Handshake successful: Agreed network
protocol version 101
[34318.675128] drbd server: Agreed to support TRIM on protocol level
[34318.675218] drbd server: Peer authenticated using 20 bytes HMAC
[34318.675258] drbd server: conn( WFConnection -> WFReportParams )
[34318.675276] drbd server: Starting asender thread (from drbd_r_server
[4344])
[34318.738916] block drbd0: self
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34318.738921] block drbd0: peer
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34318.738924] block drbd0: uuid_compare()=0 by rule 40
[34318.738933] block drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
[34328.812317] block drbd0: peer( Secondary -> Primary )
[37316.065793] usb 3-11: USB disconnect, device number 3
[52246.642265] block drbd0: peer( Primary -> Secondary )
Any help would be appreciated
Cheers
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Anne Nicolas
2016-10-18 08:05:25 UTC
Permalink
Post by Vlad
Is something wrong with the network interface?
[34114.046443] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
[34185.719207] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
full duplex, Flow control: ON - receive & transmit
[34232.241599] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
[34268.637861] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
full duplex, Flow control: ON - receive & transmit
I don't think so. This interface is part of the cluster resource and
up on master only. So it seems this is due to resource restart rather.
Post by Vlad
Post by Anne Nicolas
Hi!
I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba
and some other services.
Last updated: Fri Oct 14 17:41:38 2016
Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr
Stack: corosync
Current DC: bzvairsvr (168430081) - partition with quorum
Version: 1.1.8-9.mga5-394e906
2 Nodes configured, unknown expected votes
13 Resources configured.
Online: [ bzvairsvr bzvairsvr2 ]
Master/Slave Set: drbdservClone [drbdserv]
Slaves: [ bzvairsvr bzvairsvr2 ]
Clone Set: fencing [st-ssh]
Started: [ bzvairsvr bzvairsvr2 ]
When I reboot bzvairsvr2 this one goes primary again. But after a while
becomes secondary also.
I use a very basic fencing system based on ssh. It's not optimal but
enough for the current tests.
node 168430081: bzvairsvr
node 168430082: bzvairsvr2
primitive apache apache \
params configfile="/etc/httpd/conf/httpd.conf" \
op start interval=0 timeout=120s \
op stop interval=0 timeout=120s
primitive clusterip IPaddr2 \
params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \
meta target-role=Started
primitive clusterroute Route \
params destination="0.0.0.0/0" gateway=192.168.100.254
primitive drbdserv ocf:linbit:drbd \
params drbd_resource=server \
op monitor interval=30s role=Slave \
op monitor interval=29s role=Master start-delay=30s
primitive fsserv Filesystem \
params device="/dev/drbd/by-res/server" directory="/Server"
fstype=ext4 \
op start interval=0 timeout=60s \
op stop interval=0 timeout=60s \
meta target-role=Started
primitive libvirt-guests systemd:libvirt-guests
primitive libvirtd systemd:libvirtd
primitive mysql systemd:mysqld
primitive named systemd:named
primitive samba systemd:smb
primitive st-ssh stonith:external/ssh \
params hostlist="bzvairsvr bzvairsvr2"
group iphd clusterip clusterroute \
meta target-role=Started
group services libvirtd libvirt-guests apache named mysql samba \
meta target-role=Started
ms drbdservClone drbdserv \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true target-role=Started
clone fencing st-ssh
colocation fs_on_drbd inf: fsserv drbdservClone:Master
colocation iphd_on_services inf: iphd services
colocation services_on_fsserv inf: services fsserv
order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start
order services_after_fsserv inf: fsserv services
property cib-bootstrap-options: \
dc-version=1.1.8-9.mga5-394e906 \
cluster-infrastructure=corosync \
no-quorum-policy=ignore \
stonith-enabled=true \
master-drbdserv (10000)
Transport endpoint is not connected
attrd_perform_update: Sent update -107: master-drbdserv=10000
attrd_cib_callback: Update master-drbdserv=10000 failed: Transport
endpoint is not connected
master-drbdserv (10000)
Transport endpoint is not connected
attrd_perform_update: Sent update -107: master-drbdserv=10000
attrd_cib_callback: Update master-drbdserv=10000 failed: Transport
endpoint is not connected
And here is dmesg
[34067.547147] block drbd0: peer( Secondary -> Primary )
[34091.023206] block drbd0: peer( Primary -> Secondary )
[34096.616319] drbd server: peer( Secondary -> Unknown ) conn( Connected
-> TearDown ) pdsk( UpToDate -> DUnknown )
[34096.616353] drbd server: asender terminated
[34096.616358] drbd server: Terminating drbd_a_server
[34096.682874] drbd server: Connection closed
[34096.682894] drbd server: conn( TearDown -> Unconnected )
[34096.682897] drbd server: receiver terminated
[34096.682900] drbd server: Restarting receiver thread
[34096.682902] drbd server: receiver (re)started
[34096.682915] drbd server: conn( Unconnected -> WFConnection )
[34103.311898] drbd server: Handshake successful: Agreed network
protocol version 101
[34103.311903] drbd server: Agreed to support TRIM on protocol level
[34103.311997] drbd server: Peer authenticated using 20 bytes HMAC
[34103.312046] drbd server: conn( WFConnection -> WFReportParams )
[34103.312062] drbd server: Starting asender thread (from drbd_r_server
[4344])
[34103.380318] block drbd0: self
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34103.380323] block drbd0: peer
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34103.380327] block drbd0: uuid_compare()=0 by rule 40
[34103.380335] block drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
[34114.046443] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
[34123.802580] drbd server: PingAck did not arrive in time.
[34123.802617] drbd server: peer( Secondary -> Unknown ) conn( Connected
-> NetworkFailure ) pdsk( UpToDate -> DUnknown )
[34123.802773] drbd server: asender terminated
[34123.802777] drbd server: Terminating drbd_a_server
[34123.932565] drbd server: Connection closed
[34123.932585] drbd server: conn( NetworkFailure -> Unconnected )
[34123.932588] drbd server: receiver terminated
[34123.932590] drbd server: Restarting receiver thread
[34123.932592] drbd server: receiver (re)started
[34123.932605] drbd server: conn( Unconnected -> WFConnection )
[34185.719207] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
full duplex, Flow control: ON - receive & transmit
[34232.241599] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
[34268.637861] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
full duplex, Flow control: ON - receive & transmit
[34318.675122] drbd server: Handshake successful: Agreed network
protocol version 101
[34318.675128] drbd server: Agreed to support TRIM on protocol level
[34318.675218] drbd server: Peer authenticated using 20 bytes HMAC
[34318.675258] drbd server: conn( WFConnection -> WFReportParams )
[34318.675276] drbd server: Starting asender thread (from drbd_r_server
[4344])
[34318.738916] block drbd0: self
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34318.738921] block drbd0: peer
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34318.738924] block drbd0: uuid_compare()=0 by rule 40
[34318.738933] block drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
[34328.812317] block drbd0: peer( Secondary -> Primary )
[37316.065793] usb 3-11: USB disconnect, device number 3
[52246.642265] block drbd0: peer( Primary -> Secondary )
Any help would be appreciated
Cheers
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
--
Anne
http://www.mageia.org

_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Vlad
2016-10-21 16:20:28 UTC
Permalink
In your post I didn't see any cluster configuration related to bnx2x
only regarding IP address.
Post by Anne Nicolas
Post by Vlad
Is something wrong with the network interface?
[34114.046443] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
[34185.719207] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
full duplex, Flow control: ON - receive & transmit
[34232.241599] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
[34268.637861] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
full duplex, Flow control: ON - receive & transmit
I don't think so. This interface is part of the cluster resource and
up on master only. So it seems this is due to resource restart rather.
Post by Vlad
Post by Anne Nicolas
Hi!
I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba
and some other services.
Last updated: Fri Oct 14 17:41:38 2016
Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr
Stack: corosync
Current DC: bzvairsvr (168430081) - partition with quorum
Version: 1.1.8-9.mga5-394e906
2 Nodes configured, unknown expected votes
13 Resources configured.
Online: [ bzvairsvr bzvairsvr2 ]
Master/Slave Set: drbdservClone [drbdserv]
Slaves: [ bzvairsvr bzvairsvr2 ]
Clone Set: fencing [st-ssh]
Started: [ bzvairsvr bzvairsvr2 ]
When I reboot bzvairsvr2 this one goes primary again. But after a while
becomes secondary also.
I use a very basic fencing system based on ssh. It's not optimal but
enough for the current tests.
node 168430081: bzvairsvr
node 168430082: bzvairsvr2
primitive apache apache \
params configfile="/etc/httpd/conf/httpd.conf" \
op start interval=0 timeout=120s \
op stop interval=0 timeout=120s
primitive clusterip IPaddr2 \
params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \
meta target-role=Started
primitive clusterroute Route \
params destination="0.0.0.0/0" gateway=192.168.100.254
primitive drbdserv ocf:linbit:drbd \
params drbd_resource=server \
op monitor interval=30s role=Slave \
op monitor interval=29s role=Master start-delay=30s
primitive fsserv Filesystem \
params device="/dev/drbd/by-res/server" directory="/Server"
fstype=ext4 \
op start interval=0 timeout=60s \
op stop interval=0 timeout=60s \
meta target-role=Started
primitive libvirt-guests systemd:libvirt-guests
primitive libvirtd systemd:libvirtd
primitive mysql systemd:mysqld
primitive named systemd:named
primitive samba systemd:smb
primitive st-ssh stonith:external/ssh \
params hostlist="bzvairsvr bzvairsvr2"
group iphd clusterip clusterroute \
meta target-role=Started
group services libvirtd libvirt-guests apache named mysql samba \
meta target-role=Started
ms drbdservClone drbdserv \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true target-role=Started
clone fencing st-ssh
colocation fs_on_drbd inf: fsserv drbdservClone:Master
colocation iphd_on_services inf: iphd services
colocation services_on_fsserv inf: services fsserv
order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start
order services_after_fsserv inf: fsserv services
property cib-bootstrap-options: \
dc-version=1.1.8-9.mga5-394e906 \
cluster-infrastructure=corosync \
no-quorum-policy=ignore \
stonith-enabled=true \
master-drbdserv (10000)
Transport endpoint is not connected
attrd_perform_update: Sent update -107: master-drbdserv=10000
attrd_cib_callback: Update master-drbdserv=10000 failed: Transport
endpoint is not connected
master-drbdserv (10000)
Transport endpoint is not connected
attrd_perform_update: Sent update -107: master-drbdserv=10000
attrd_cib_callback: Update master-drbdserv=10000 failed: Transport
endpoint is not connected
And here is dmesg
[34067.547147] block drbd0: peer( Secondary -> Primary )
[34091.023206] block drbd0: peer( Primary -> Secondary )
[34096.616319] drbd server: peer( Secondary -> Unknown ) conn( Connected
-> TearDown ) pdsk( UpToDate -> DUnknown )
[34096.616353] drbd server: asender terminated
[34096.616358] drbd server: Terminating drbd_a_server
[34096.682874] drbd server: Connection closed
[34096.682894] drbd server: conn( TearDown -> Unconnected )
[34096.682897] drbd server: receiver terminated
[34096.682900] drbd server: Restarting receiver thread
[34096.682902] drbd server: receiver (re)started
[34096.682915] drbd server: conn( Unconnected -> WFConnection )
[34103.311898] drbd server: Handshake successful: Agreed network
protocol version 101
[34103.311903] drbd server: Agreed to support TRIM on protocol level
[34103.311997] drbd server: Peer authenticated using 20 bytes HMAC
[34103.312046] drbd server: conn( WFConnection -> WFReportParams )
[34103.312062] drbd server: Starting asender thread (from drbd_r_server
[4344])
[34103.380318] block drbd0: self
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34103.380323] block drbd0: peer
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34103.380327] block drbd0: uuid_compare()=0 by rule 40
[34103.380335] block drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
[34114.046443] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
[34123.802580] drbd server: PingAck did not arrive in time.
[34123.802617] drbd server: peer( Secondary -> Unknown ) conn( Connected
-> NetworkFailure ) pdsk( UpToDate -> DUnknown )
[34123.802773] drbd server: asender terminated
[34123.802777] drbd server: Terminating drbd_a_server
[34123.932565] drbd server: Connection closed
[34123.932585] drbd server: conn( NetworkFailure -> Unconnected )
[34123.932588] drbd server: receiver terminated
[34123.932590] drbd server: Restarting receiver thread
[34123.932592] drbd server: receiver (re)started
[34123.932605] drbd server: conn( Unconnected -> WFConnection )
[34185.719207] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
full duplex, Flow control: ON - receive & transmit
[34232.241599] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
[34268.637861] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
full duplex, Flow control: ON - receive & transmit
[34318.675122] drbd server: Handshake successful: Agreed network
protocol version 101
[34318.675128] drbd server: Agreed to support TRIM on protocol level
[34318.675218] drbd server: Peer authenticated using 20 bytes HMAC
[34318.675258] drbd server: conn( WFConnection -> WFReportParams )
[34318.675276] drbd server: Starting asender thread (from drbd_r_server
[4344])
[34318.738916] block drbd0: self
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34318.738921] block drbd0: peer
8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34318.738924] block drbd0: uuid_compare()=0 by rule 40
[34318.738933] block drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
[34328.812317] block drbd0: peer( Secondary -> Primary )
[37316.065793] usb 3-11: USB disconnect, device number 3
[52246.642265] block drbd0: peer( Primary -> Secondary )
Any help would be appreciated
Cheers
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Loading...