Discussion:
[Pacemaker] Announcing the Heartbeat 3.0.6 Release
Lars Ellenberg
2015-02-10 21:24:30 UTC
Permalink
TL;DR:

If you intend to set up a new High Availability cluster
using the Pacemaker cluster manager,
you typically should not care for Heartbeat,
but use recent releases (2.3.x) of Corosync.

If you don't care for Heartbeat, don't read further.

Unless you are beekhof... there's a question below ;-)

------------------------------------------------------------------------

After 3œ years since the last "officially tagged" release of Heartbeat,
I have seen the need to do a new "maintenance release".

The Heartbeat 3.0.6 release tag: 3d59540cf28d
and the change set it points to: cceeb47a7d8f

The main reason for this was that pacemaker more recent than
somewhere between 1.1.6 and 1.1.7 would no longer work properly
on the Heartbeat cluster stack.

Because some of the daemons have moved from "glue" to "pacemaker" proper,
and changed their paths. This has been fixed in Heartbeat.

And because during that time, stonith-ng was refactored, and would still
reliably fence, but not understand its own confirmation message, so it
was effectively broken. This I fixed in pacemaker.

------------------------------------------------------------------------

If you chose to run new Pacemaker with the Heartbeat communication stack,
it should be at least 1.1.12 with a few patches,
see my December 2014 commits at the top of
https://github.com/lge/pacemaker/commits/linbit-cluster-stack-pcmk-1.1.12
I'm not sure if they got into pacemaker upstream yet.

beekhof?
Do I need to rebase?
Or did I miss you merging these?

---

If you have those patches,
consider setting this new ha.cf configuration parameter:

# If pacemaker crmd spawns the pengine itself,
# it sometimes "forgets" to kill the pengine on shutdown,
# which later may confuse the system after cluster restart.
# Tell the system that Heartbeat is supposed to
# control the pengine directly.
crmd_spawns_pengine off

------------------------------------------------------------------------

Here is the shortened Heartbeat changelog,
the longer version is available in mercurial:
http://hg.linux-ha.org/heartbeat-STABLE_3_0/shortlog

- fix emergency shutdown due to broken update_ackseq
- fix node dead detection problems
- fix converging of membership (ccm)
- fix init script startup glitch (caused by changes in glue/resource-agents)
- heartbeat.service file for systemd platforms
- new ucast6 UDP IPv6 communication plugin
- package ha_api.py in standard package
- update some man pages, specifically the example ha.cf
- also report ccm membership status for cl_status hbstatus -v
- updated some log messages, or their log levels
- reduce max_delay in broadcast client_status query to one second
- apply various (mostly cosmetic) patches from Debian
- drop HBcompress compression plugins: they are part of cluster glue
- drop "openais" HBcomm plugin
- better support for current pacemaker versions
- try to not miss a SIGTERM (fix problem with very fast respawn/stop cycle)
- dopd: ignore dead ping nodes
- cl_status improvements
- api internals: reduce IPC round-trips to get at status information
- uid=root is sufficient to use heartbeat api (gid=haclient remains sufficient)
- fix /dev/null as log- or debugfile setting
- move daemon binaries into libexecdir
- document movement of compression plugins into cluster-glue
- fix usage of SO_REUSEPORT in ucast sockets
- fix compile issues with recent gcc and -Werror

Note that a number of the mentioned "fixes" have been created two years
ago already, and may have been released in packages for a long time,
where vendors have chosen to package them.

------------------------------------------------------------------------

As to future plans for Heartbeat:

Heartbeat is still useful for non-pacemaker, "haresources"-mode clusters.

We (Linbit) will maintain Heartbeat for the foreseeable future.
That should not be too much of a burden, as it is "stable",
and due to long years of field exposure, "all bugs are known" ;-)

The most notable shortcoming when using Heartbeat with Pacemaker
clusters would be the limited message size.
There are currently no plans to remove that limitation.

With its wide choice of communications paths, even "exotic"
communication plugins, and the ability to run "arbitrarily many"
paths, some deployments may even favor it over Corosync still.

But typically, for new deployments involving Pacemaker,
in most cases you should chose Corosync 2.3.x
as your membership and communication layer.

For existing deployments using Heartbeat,
upgrading to this Heartbeat version is strongly recommended.

Thanks,

Lars Ellenberg
Nikita Michalko
2015-02-11 07:34:26 UTC
Permalink
Post by Lars Ellenberg
If you intend to set up a new High Availability cluster
using the Pacemaker cluster manager,
you typically should not care for Heartbeat,
but use recent releases (2.3.x) of Corosync.
If you don't care for Heartbeat, don't read further.
Unless you are beekhof... there's a question below ;-)
------------------------------------------------------------------------
After 3œ years since the last "officially tagged" release of Heartbeat,
I have seen the need to do a new "maintenance release".
The Heartbeat 3.0.6 release tag: 3d59540cf28d
and the change set it points to: cceeb47a7d8f
GREAT !!! Thank you very much, Lars! Heartbeat is still running on
some our production clusters ...
Post by Lars Ellenberg
The main reason for this was that pacemaker more recent than
somewhere between 1.1.6 and 1.1.7 would no longer work properly
on the Heartbeat cluster stack.
Because some of the daemons have moved from "glue" to "pacemaker" proper,
and changed their paths. This has been fixed in Heartbeat.
And because during that time, stonith-ng was refactored, and would still
reliably fence, but not understand its own confirmation message, so it
was effectively broken. This I fixed in pacemaker.
------------------------------------------------------------------------
If you chose to run new Pacemaker with the Heartbeat communication stack,
it should be at least 1.1.12 with a few patches,
see my December 2014 commits at the top of
https://github.com/lge/pacemaker/commits/linbit-cluster-stack-pcmk-1.1.12
I'm not sure if they got into pacemaker upstream yet.
beekhof?
Do I need to rebase?
Or did I miss you merging these?
---
If you have those patches,
# If pacemaker crmd spawns the pengine itself,
# it sometimes "forgets" to kill the pengine on shutdown,
# which later may confuse the system after cluster restart.
# Tell the system that Heartbeat is supposed to
# control the pengine directly.
crmd_spawns_pengine off
------------------------------------------------------------------------
Here is the shortened Heartbeat changelog,
http://hg.linux-ha.org/heartbeat-STABLE_3_0/shortlog
- fix emergency shutdown due to broken update_ackseq
- fix node dead detection problems
- fix converging of membership (ccm)
- fix init script startup glitch (caused by changes in glue/resource-agents)
- heartbeat.service file for systemd platforms
- new ucast6 UDP IPv6 communication plugin
- package ha_api.py in standard package
- update some man pages, specifically the example ha.cf
- also report ccm membership status for cl_status hbstatus -v
- updated some log messages, or their log levels
- reduce max_delay in broadcast client_status query to one second
- apply various (mostly cosmetic) patches from Debian
- drop HBcompress compression plugins: they are part of cluster glue
- drop "openais" HBcomm plugin
- better support for current pacemaker versions
- try to not miss a SIGTERM (fix problem with very fast respawn/stop cycle)
- dopd: ignore dead ping nodes
- cl_status improvements
- api internals: reduce IPC round-trips to get at status information
- uid=root is sufficient to use heartbeat api (gid=haclient remains sufficient)
- fix /dev/null as log- or debugfile setting
- move daemon binaries into libexecdir
- document movement of compression plugins into cluster-glue
- fix usage of SO_REUSEPORT in ucast sockets
- fix compile issues with recent gcc and -Werror
Note that a number of the mentioned "fixes" have been created two years
ago already, and may have been released in packages for a long time,
where vendors have chosen to package them.
------------------------------------------------------------------------
Heartbeat is still useful for non-pacemaker, "haresources"-mode clusters.
We (Linbit) will maintain Heartbeat for the foreseeable future.
That should not be too much of a burden, as it is "stable",
and due to long years of field exposure, "all bugs are known" ;-)
The most notable shortcoming when using Heartbeat with Pacemaker
clusters would be the limited message size.
There are currently no plans to remove that limitation.
With its wide choice of communications paths, even "exotic"
communication plugins, and the ability to run "arbitrarily many"
paths, some deployments may even favor it over Corosync still.
But typically, for new deployments involving Pacemaker,
in most cases you should chose Corosync 2.3.x
as your membership and communication layer.
For existing deployments using Heartbeat,
upgrading to this Heartbeat version is strongly recommended.
Thanks,
Lars Ellenberg
_______________________________________________
Linux-HA mailing list
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Andrew Beekhof
2015-02-19 21:44:37 UTC
Permalink
Post by Lars Ellenberg
If you intend to set up a new High Availability cluster
using the Pacemaker cluster manager,
you typically should not care for Heartbeat,
but use recent releases (2.3.x) of Corosync.
If you don't care for Heartbeat, don't read further.
Unless you are beekhof... there's a question below ;-)
------------------------------------------------------------------------
After 3½ years since the last "officially tagged" release of Heartbeat,
I have seen the need to do a new "maintenance release".
The Heartbeat 3.0.6 release tag: 3d59540cf28d
and the change set it points to: cceeb47a7d8f
The main reason for this was that pacemaker more recent than
somewhere between 1.1.6 and 1.1.7 would no longer work properly
on the Heartbeat cluster stack.
Because some of the daemons have moved from "glue" to "pacemaker" proper,
and changed their paths. This has been fixed in Heartbeat.
And because during that time, stonith-ng was refactored, and would still
reliably fence, but not understand its own confirmation message, so it
was effectively broken. This I fixed in pacemaker.
------------------------------------------------------------------------
If you chose to run new Pacemaker with the Heartbeat communication stack,
it should be at least 1.1.12 with a few patches,
see my December 2014 commits at the top of
https://github.com/lge/pacemaker/commits/linbit-cluster-stack-pcmk-1.1.12
I'm not sure if they got into pacemaker upstream yet.
beekhof?
Do I need to rebase?
Or did I miss you merging these?
Merged now :-)

We're about to start the 1.1.13 release cycle, so it wont be far away
Post by Lars Ellenberg
---
If you have those patches,
# If pacemaker crmd spawns the pengine itself,
# it sometimes "forgets" to kill the pengine on shutdown,
# which later may confuse the system after cluster restart.
# Tell the system that Heartbeat is supposed to
# control the pengine directly.
crmd_spawns_pengine off
------------------------------------------------------------------------
Here is the shortened Heartbeat changelog,
http://hg.linux-ha.org/heartbeat-STABLE_3_0/shortlog
- fix emergency shutdown due to broken update_ackseq
- fix node dead detection problems
- fix converging of membership (ccm)
- fix init script startup glitch (caused by changes in glue/resource-agents)
- heartbeat.service file for systemd platforms
- new ucast6 UDP IPv6 communication plugin
- package ha_api.py in standard package
- update some man pages, specifically the example ha.cf
- also report ccm membership status for cl_status hbstatus -v
- updated some log messages, or their log levels
- reduce max_delay in broadcast client_status query to one second
- apply various (mostly cosmetic) patches from Debian
- drop HBcompress compression plugins: they are part of cluster glue
- drop "openais" HBcomm plugin
- better support for current pacemaker versions
- try to not miss a SIGTERM (fix problem with very fast respawn/stop cycle)
- dopd: ignore dead ping nodes
- cl_status improvements
- api internals: reduce IPC round-trips to get at status information
- uid=root is sufficient to use heartbeat api (gid=haclient remains sufficient)
- fix /dev/null as log- or debugfile setting
- move daemon binaries into libexecdir
- document movement of compression plugins into cluster-glue
- fix usage of SO_REUSEPORT in ucast sockets
- fix compile issues with recent gcc and -Werror
Note that a number of the mentioned "fixes" have been created two years
ago already, and may have been released in packages for a long time,
where vendors have chosen to package them.
------------------------------------------------------------------------
Heartbeat is still useful for non-pacemaker, "haresources"-mode clusters.
We (Linbit) will maintain Heartbeat for the foreseeable future.
That should not be too much of a burden, as it is "stable",
and due to long years of field exposure, "all bugs are known" ;-)
The most notable shortcoming when using Heartbeat with Pacemaker
clusters would be the limited message size.
There are currently no plans to remove that limitation.
With its wide choice of communications paths, even "exotic"
communication plugins, and the ability to run "arbitrarily many"
paths, some deployments may even favor it over Corosync still.
But typically, for new deployments involving Pacemaker,
in most cases you should chose Corosync 2.3.x
as your membership and communication layer.
For existing deployments using Heartbeat,
upgrading to this Heartbeat version is strongly recommended.
Thanks,
Lars Ellenberg
_______________________________________________
Linux-HA mailing list
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Loading...