Discussion:
[Pacemaker] Reinstall Pacemaker/Corosync.
Cayab, Jefrey E.
2015-11-24 10:18:26 UTC
Permalink
Hi all,

I searched online but couldn't find a detailed answer. OS is RHEL 6.5.

Problem:
I have 2 servers which was setup fine (MySQL cluster is on it, DRBD for the
data disk on local disk) on which these 2 servers needs to be migrated to
other location. When it was migrated, the DRBD has to change from local
disk to SAN LUN which was migrated ok but the cluster began experiencing
weird behavior. Then the 2 nodes are shutdown and booted together, each
server can see each other as online via "crm_mon -1" but when one of the
node's pacemaker process is restarted, the status of that node from the
other node stays offline/stopped, even if I reboot that node, it doesn't
join back the cluster.

Other observation - if these 2 servers boot up together, both see online as
above and when I stop pacemaker process on the Active node, the other node
takes over the resources which is good but even if I start back the
pacemaker process on the other node, it's not able to take back the
resources. Kind of like, only one failover can happen and cannot failback.


What I did:
I removed Pacemaker and Corosync via YUM
Rebooted the OS
Verified no more Pacemaker/Corosync packages
Installed back Pacemaker and Corosync via YUM
When I did "crm_mon -1", I'm surprised to see that configuration is still
there.

After the reinstallation, still experiencing the same behavior and noticed
that DRBD is reporting Failed disk - only a reboot of the node can bring it
back to UpToDate.

Please advise on the correct procedure to wipe out the configuration and
reinstallation.

I will share the logs shortly.

Thanks,
Jef
emmanuel segura
2015-11-24 10:28:40 UTC
Permalink
I don't remember well, But I think in Redhat 6.5 you need to use
cman+pacemaker and please your config and you need to be sure you have
fencing configured.
Post by Cayab, Jefrey E.
Hi all,
I searched online but couldn't find a detailed answer. OS is RHEL 6.5.
I have 2 servers which was setup fine (MySQL cluster is on it, DRBD for the
data disk on local disk) on which these 2 servers needs to be migrated to
other location. When it was migrated, the DRBD has to change from local disk
to SAN LUN which was migrated ok but the cluster began experiencing weird
behavior. Then the 2 nodes are shutdown and booted together, each server can
see each other as online via "crm_mon -1" but when one of the node's
pacemaker process is restarted, the status of that node from the other node
stays offline/stopped, even if I reboot that node, it doesn't join back the
cluster.
Other observation - if these 2 servers boot up together, both see online as
above and when I stop pacemaker process on the Active node, the other node
takes over the resources which is good but even if I start back the
pacemaker process on the other node, it's not able to take back the
resources. Kind of like, only one failover can happen and cannot failback.
I removed Pacemaker and Corosync via YUM
Rebooted the OS
Verified no more Pacemaker/Corosync packages
Installed back Pacemaker and Corosync via YUM
When I did "crm_mon -1", I'm surprised to see that configuration is still
there.
After the reinstallation, still experiencing the same behavior and noticed
that DRBD is reporting Failed disk - only a reboot of the node can bring it
back to UpToDate.
Please advise on the correct procedure to wipe out the configuration and
reinstallation.
I will share the logs shortly.
Thanks,
Jef
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
--
.~.
/V\
// \\
/( )\
^`~'^

_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Ken Gaillot
2015-11-30 22:30:47 UTC
Permalink
Post by emmanuel segura
I don't remember well, But I think in Redhat 6.5 you need to use
cman+pacemaker and please your config and you need to be sure you have
fencing configured.
Yes, the versions in 6.5 are quite old; 6.7 has recent versions, so if
you can upgrade, that would help. Even 6.6 is significantly newer and
has important bugfixes.

RHEL 6 does use corosync 1, but via CMAN rather than directly.

You can use the pcs command to configure and deconfigure the cluster
(pcs cluster node add/remove for one node, or pcs cluster setup/destroy
for the entire cluster).
Post by emmanuel segura
Post by Cayab, Jefrey E.
Hi all,
I searched online but couldn't find a detailed answer. OS is RHEL 6.5.
I have 2 servers which was setup fine (MySQL cluster is on it, DRBD for the
data disk on local disk) on which these 2 servers needs to be migrated to
other location. When it was migrated, the DRBD has to change from local disk
to SAN LUN which was migrated ok but the cluster began experiencing weird
behavior. Then the 2 nodes are shutdown and booted together, each server can
see each other as online via "crm_mon -1" but when one of the node's
pacemaker process is restarted, the status of that node from the other node
stays offline/stopped, even if I reboot that node, it doesn't join back the
cluster.
Other observation - if these 2 servers boot up together, both see online as
above and when I stop pacemaker process on the Active node, the other node
takes over the resources which is good but even if I start back the
pacemaker process on the other node, it's not able to take back the
resources. Kind of like, only one failover can happen and cannot failback.
I removed Pacemaker and Corosync via YUM
Rebooted the OS
Verified no more Pacemaker/Corosync packages
Installed back Pacemaker and Corosync via YUM
When I did "crm_mon -1", I'm surprised to see that configuration is still
there.
After the reinstallation, still experiencing the same behavior and noticed
that DRBD is reporting Failed disk - only a reboot of the node can bring it
back to UpToDate.
Please advise on the correct procedure to wipe out the configuration and
reinstallation.
I will share the logs shortly.
Thanks,
Jef
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Loading...