Cayab, Jefrey E.
2015-11-24 10:18:26 UTC
Hi all,
I searched online but couldn't find a detailed answer. OS is RHEL 6.5.
Problem:
I have 2 servers which was setup fine (MySQL cluster is on it, DRBD for the
data disk on local disk) on which these 2 servers needs to be migrated to
other location. When it was migrated, the DRBD has to change from local
disk to SAN LUN which was migrated ok but the cluster began experiencing
weird behavior. Then the 2 nodes are shutdown and booted together, each
server can see each other as online via "crm_mon -1" but when one of the
node's pacemaker process is restarted, the status of that node from the
other node stays offline/stopped, even if I reboot that node, it doesn't
join back the cluster.
Other observation - if these 2 servers boot up together, both see online as
above and when I stop pacemaker process on the Active node, the other node
takes over the resources which is good but even if I start back the
pacemaker process on the other node, it's not able to take back the
resources. Kind of like, only one failover can happen and cannot failback.
What I did:
I removed Pacemaker and Corosync via YUM
Rebooted the OS
Verified no more Pacemaker/Corosync packages
Installed back Pacemaker and Corosync via YUM
When I did "crm_mon -1", I'm surprised to see that configuration is still
there.
After the reinstallation, still experiencing the same behavior and noticed
that DRBD is reporting Failed disk - only a reboot of the node can bring it
back to UpToDate.
Please advise on the correct procedure to wipe out the configuration and
reinstallation.
I will share the logs shortly.
Thanks,
Jef
I searched online but couldn't find a detailed answer. OS is RHEL 6.5.
Problem:
I have 2 servers which was setup fine (MySQL cluster is on it, DRBD for the
data disk on local disk) on which these 2 servers needs to be migrated to
other location. When it was migrated, the DRBD has to change from local
disk to SAN LUN which was migrated ok but the cluster began experiencing
weird behavior. Then the 2 nodes are shutdown and booted together, each
server can see each other as online via "crm_mon -1" but when one of the
node's pacemaker process is restarted, the status of that node from the
other node stays offline/stopped, even if I reboot that node, it doesn't
join back the cluster.
Other observation - if these 2 servers boot up together, both see online as
above and when I stop pacemaker process on the Active node, the other node
takes over the resources which is good but even if I start back the
pacemaker process on the other node, it's not able to take back the
resources. Kind of like, only one failover can happen and cannot failback.
What I did:
I removed Pacemaker and Corosync via YUM
Rebooted the OS
Verified no more Pacemaker/Corosync packages
Installed back Pacemaker and Corosync via YUM
When I did "crm_mon -1", I'm surprised to see that configuration is still
there.
After the reinstallation, still experiencing the same behavior and noticed
that DRBD is reporting Failed disk - only a reboot of the node can bring it
back to UpToDate.
Please advise on the correct procedure to wipe out the configuration and
reinstallation.
I will share the logs shortly.
Thanks,
Jef