Discussion:
[Pacemaker] can we update an attribute with cmpxchg "atomic compare and exchange" semantics?
Lars Ellenberg
2014-09-10 09:50:58 UTC
Permalink
Hi Andrew (and others).

For a certain use case (yes, I'm talking about DRBD "peer-fencing" on
loss of replication link), it would be nice to be able to say:

update some_attribute=some_attribute+1 where some_attribute >= 0

delete some_attribute where some_attribute=0

Ok, that's not the classic cmpxchg(), more of an atomic_add();
or similar enough. With hopefully just a single cib roundrip.


Let me rephrase:
Update attribute "this_is_pink" (for node-X with ID attr-ID):

fail if said attr-ID exists elsewhere (not as the intended attribute
at the intended place in the xml tree)
(this comes for free already, I think)

if it does not exist at all, assume it was present with current value 0

if the current (or assumed current) value is >= 0, add 1

if the current value is < 0, fail

(optionally: return new value? old value?)




My intended use case scenario is this:

Two DRBD nodes, several DRBD resources,
at least a few of them in "dual-primary".

Replication link breaks.

Fence-peer handlers are triggered individually for each resource on
both nodes, and try to concurrently modify the cib (place fencing
constraints).

With the current implementation of crm-fence-peer.sh, it is likely that
some DRBD resources "win" on one node, some "win" on the other node.
The respective losers will have their IO blocked.

Which means that most likely on both nodes some DRBD will stay blocked,
some monitor operation will soon fail, some stop operation (to recover
from the monitor fail) will soon fail, and the recovery of that will be
node-level fencing of the affected node.

In short: both nodes will be hard-reset
because of a replication link failure.



If I would instead use a single attribute (with a pre-determined ID) for all
instances of the fence-peer handler, the first to come would "chose" the
victim node, all others would just add their count.
There will be only one loser, and more importantly: one survivor.

Once the replication link is re-established,
DRBD resynchronization will bring the former loser up-to-date,
and the respective after-resync handlers will decrease that "breakage
count". Once the breakage count hits zero, it can and should be deleted.

Presence of the "breakage count" attribute with value > 0 would mean
"this node must not be promoted", which would be a static constraint
to be added to all DRBD resources.

Does that make sense?

(I have more insane proposals, in case we have multiple (more than 2)
Primaries during normal operation, but I'm not yet able to write them
down without being seriously confused by myself...)


I could open-code it with shell and cibadmin, btw.
I did a proof-of-concept once that does
a. cibadmin -Q
b. some calculations,
then prepares the update statement xml based on cib content seen,
*including* the cib generation counters
c. cibadmin -R (or -C, -M, -D, as appropriate)
this will fail if the cib was modified in a relevant way since a,
because of the included generation counters
d. repeat as necessary


But that is beyond ugly.
And probably fragile.
And would often fail for all the wrong reasons, just because some status
code has changed and bumped the cib generation counters.

What would be needed to add such functionality?
Where would it go?
cibadmin? cib? crm_attribute? possibly also attrd?

Thanks,
Lars
Lars Ellenberg
2014-09-29 20:22:06 UTC
Permalink
Post by Lars Ellenberg
Hi Andrew (and others).
For a certain use case (yes, I'm talking about DRBD "peer-fencing" on
update some_attribute=some_attribute+1 where some_attribute >= 0
delete some_attribute where some_attribute=0
Ok, that's not the classic cmpxchg(), more of an atomic_add();
or similar enough. With hopefully just a single cib roundrip.
fail if said attr-ID exists elsewhere (not as the intended attribute
at the intended place in the xml tree)
(this comes for free already, I think)
if it does not exist at all, assume it was present with current value 0
if the current (or assumed current) value is >= 0, add 1
if the current value is < 0, fail
(optionally: return new value? old value?)
Did anyone read this?
Post by Lars Ellenberg
Two DRBD nodes, several DRBD resources,
at least a few of them in "dual-primary".
Replication link breaks.
Fence-peer handlers are triggered individually for each resource on
both nodes, and try to concurrently modify the cib (place fencing
constraints).
With the current implementation of crm-fence-peer.sh, it is likely that
some DRBD resources "win" on one node, some "win" on the other node.
The respective losers will have their IO blocked.
Which means that most likely on both nodes some DRBD will stay blocked,
some monitor operation will soon fail, some stop operation (to recover
from the monitor fail) will soon fail, and the recovery of that will be
node-level fencing of the affected node.
In short: both nodes will be hard-reset
because of a replication link failure.
If I would instead use a single attribute (with a pre-determined ID) for all
instances of the fence-peer handler, the first to come would "chose" the
victim node, all others would just add their count.
There will be only one loser, and more importantly: one survivor.
Once the replication link is re-established,
DRBD resynchronization will bring the former loser up-to-date,
and the respective after-resync handlers will decrease that "breakage
count". Once the breakage count hits zero, it can and should be deleted.
Presence of the "breakage count" attribute with value > 0 would mean
"this node must not be promoted", which would be a static constraint
to be added to all DRBD resources.
Does that make sense?
(I have more insane proposals, in case we have multiple (more than 2)
Primaries during normal operation, but I'm not yet able to write them
down without being seriously confused by myself...)
I could open-code it with shell and cibadmin, btw.
I did a proof-of-concept once that does
a. cibadmin -Q
b. some calculations,
then prepares the update statement xml based on cib content seen,
*including* the cib generation counters
c. cibadmin -R (or -C, -M, -D, as appropriate)
this will fail if the cib was modified in a relevant way since a,
because of the included generation counters
d. repeat as necessary
But that is beyond ugly.
And probably fragile.
And would often fail for all the wrong reasons, just because some status
code has changed and bumped the cib generation counters.
What would be needed to add such functionality?
Where would it go?
cibadmin? cib? crm_attribute? possibly also attrd?
Thanks,
Lars
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD? and LINBIT? are registered trademarks of LINBIT, Austria.
Andrew Beekhof
2014-09-30 03:51:21 UTC
Permalink
Post by Lars Ellenberg
Post by Lars Ellenberg
Hi Andrew (and others).
For a certain use case (yes, I'm talking about DRBD "peer-fencing" on
update some_attribute=some_attribute+1 where some_attribute >= 0
delete some_attribute where some_attribute=0
Ok, that's not the classic cmpxchg(), more of an atomic_add();
or similar enough. With hopefully just a single cib roundrip.
fail if said attr-ID exists elsewhere (not as the intended attribute
at the intended place in the xml tree)
(this comes for free already, I think)
if it does not exist at all, assume it was present with current value 0
if the current (or assumed current) value is >= 0, add 1
if the current value is < 0, fail
(optionally: return new value? old value?)
Did anyone read this?
Yep, but it requires a non-trivial answer so it got deferred :)

Its a reasonable request, we've spoken about something similar in the past and its clear that at some point attrd needs to grow some extra capabilities.
Exactly when it will bubble up to the top of the todo list is less certain, though I would happily coach someone with the necessary motivation.

The other thing to mention is that currently the only part that wont work is "if the current value is < 0, fail".
Setting value="value++" will do the rest.

So my question would be... how important is the 'lt 0' case?

Actually, come to think of it, it's not a bad default behaviour.
Certainly failing value++ if value=-INFINITY would be logically consistent with the existing code.
Would that be sufficient?
Post by Lars Ellenberg
Post by Lars Ellenberg
Two DRBD nodes, several DRBD resources,
at least a few of them in "dual-primary".
Replication link breaks.
Fence-peer handlers are triggered individually for each resource on
both nodes, and try to concurrently modify the cib (place fencing
constraints).
With the current implementation of crm-fence-peer.sh, it is likely that
some DRBD resources "win" on one node, some "win" on the other node.
The respective losers will have their IO blocked.
Which means that most likely on both nodes some DRBD will stay blocked,
some monitor operation will soon fail, some stop operation (to recover
from the monitor fail) will soon fail, and the recovery of that will be
node-level fencing of the affected node.
In short: both nodes will be hard-reset
because of a replication link failure.
If I would instead use a single attribute (with a pre-determined ID) for all
instances of the fence-peer handler, the first to come would "chose" the
victim node, all others would just add their count.
There will be only one loser, and more importantly: one survivor.
Once the replication link is re-established,
DRBD resynchronization will bring the former loser up-to-date,
and the respective after-resync handlers will decrease that "breakage
count". Once the breakage count hits zero, it can and should be deleted.
Presence of the "breakage count" attribute with value > 0 would mean
"this node must not be promoted", which would be a static constraint
to be added to all DRBD resources.
Does that make sense?
(I have more insane proposals, in case we have multiple (more than 2)
Primaries during normal operation, but I'm not yet able to write them
down without being seriously confused by myself...)
I could open-code it with shell and cibadmin, btw.
I did a proof-of-concept once that does
a. cibadmin -Q
b. some calculations,
then prepares the update statement xml based on cib content seen,
*including* the cib generation counters
c. cibadmin -R (or -C, -M, -D, as appropriate)
this will fail if the cib was modified in a relevant way since a,
because of the included generation counters
d. repeat as necessary
But that is beyond ugly.
And probably fragile.
And would often fail for all the wrong reasons, just because some status
code has changed and bumped the cib generation counters.
What would be needed to add such functionality?
Where would it go?
cibadmin? cib? crm_attribute? possibly also attrd?
Thanks,
Lars
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD? and LINBIT? are registered trademarks of LINBIT, Austria.
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140930/eee5435d/attachment-0001.sig>
Lars Ellenberg
2014-09-30 13:31:11 UTC
Permalink
Post by Andrew Beekhof
Post by Lars Ellenberg
Post by Lars Ellenberg
Hi Andrew (and others).
For a certain use case (yes, I'm talking about DRBD "peer-fencing" on
update some_attribute=some_attribute+1 where some_attribute >= 0
delete some_attribute where some_attribute=0
Ok, that's not the classic cmpxchg(), more of an atomic_add();
or similar enough. With hopefully just a single cib roundrip.
fail if said attr-ID exists elsewhere (not as the intended attribute
at the intended place in the xml tree)
(this comes for free already, I think)
if it does not exist at all, assume it was present with current value 0
if the current (or assumed current) value is >= 0, add 1
if the current value is < 0, fail
(optionally: return new value? old value?)
Did anyone read this?
Yep, but it requires a non-trivial answer so it got deferred :)
Its a reasonable request, we've spoken about something similar in the past and its clear that at some point attrd needs to grow some extra capabilities.
Exactly when it will bubble up to the top of the todo list is less certain, though I would happily coach someone with the necessary motivation.
The other thing to mention is that currently the only part that wont work is "if the current value is < 0, fail".
Setting value="value++" will do the rest.
Nice.
Post by Andrew Beekhof
So my question would be... how important is the 'lt 0' case?
Actually, come to think of it, it's not a bad default behaviour.
Certainly failing value++ if value=-INFINITY would be logically consistent with the existing code.
Would that be sufficient?
I need to think about that some more.
I may need to actually try this out and try to implement my scenario.
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD? and LINBIT? are registered trademarks of LINBIT, Austria.
Riccardo Bicelli
2014-09-30 13:36:37 UTC
Permalink
Hello,
I've just updated my cluster nodes and now I see lot of these errors in
syslog:

Sep 30 15:32:43 localhost cib: [2870]: ERROR: crm_abort:
crm_glib_handler: Forked child 28573 to record non-fatal assert at
utils.c:449 : Source ID 128394 was not found when attempting to remove it
Sep 30 15:32:55 localhost cib: [2870]: ERROR: crm_abort:
crm_glib_handler: Forked child 28753 to record non-fatal assert at
utils.c:449 : Source ID 128395 was not found when attempting to remove it
Sep 30 15:32:55 localhost attrd: [2872]: ERROR: crm_abort:
crm_glib_handler: Forked child 28756 to record non-fatal assert at
utils.c:449 : Source ID 58434 was not found when attempting to remove it
Sep 30 15:32:55 localhost cib: [2870]: ERROR: crm_abort:
crm_glib_handler: Forked child 28757 to record non-fatal assert at
utils.c:449 : Source ID 128396 was not found when attempting to remove it
Sep 30 15:33:04 localhost cib: [2870]: ERROR: crm_abort:
crm_glib_handler: Forked child 28876 to record non-fatal assert at
utils.c:449 : Source ID 128397 was not found when attempting to remove it
Sep 30 15:33:04 localhost attrd: [2872]: ERROR: crm_abort:
crm_glib_handler: Forked child 28877 to record non-fatal assert at
utils.c:449 : Source ID 58435 was not found when attempting to remove it
Sep 30 15:33:04 localhost cib: [2870]: ERROR: crm_abort:
crm_glib_handler: Forked child 28878 to record non-fatal assert at
utils.c:449 : Source ID 128398 was not found when attempting to remove it
Sep 30 15:33:11 localhost cib: [2870]: 29010 to record non-fatal assert
at utils.c:449 : Source ID 128399 was not found when attempting to remove it
Sep 30 15:33:11 localhost attrd: [2872]: ERROR: crm_abort:
crm_glib_handler: Forked child 29011 to record non-fatal assert at
utils.c:449 : Source ID 58436 was not found when attempting to remove it
Sep 30 15:33:11 localhost cib: [2870]: ERROR: crm_abort:
crm_glib_handler: Forked child 29012 to record non-fatal assert at
utils.c:449 : Source ID 128400 was not found when attempting to remove it
Sep 30 15:33:14 localhost cib: [2870]: ERROR: crm_abort:
crm_glib_handler: Forked child 29060 to record non-fatal assert at
utils.c:449 : Source ID 128401 was not found when attempting to remove it
Sep 30 15:33:14 localhost attrd: [2872]: ERROR: crm_abort:
crm_glib_handler: Forked child 29061 to record non-fatal assert at
utils.c:449 : Source ID 58437 was not found when attempting to remove it

I don't understand what does it mean.
Andrew Beekhof
2014-09-30 21:23:13 UTC
Permalink
Post by Riccardo Bicelli
Hello,
Sep 30 15:32:43 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28573 to record non-fatal assert at utils.c:449 : Source ID 128394 was not found when attempting to remove it
Sep 30 15:32:55 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28753 to record non-fatal assert at utils.c:449 : Source ID 128395 was not found when attempting to remove it
Sep 30 15:32:55 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: Forked child 28756 to record non-fatal assert at utils.c:449 : Source ID 58434 was not found when attempting to remove it
Sep 30 15:32:55 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28757 to record non-fatal assert at utils.c:449 : Source ID 128396 was not found when attempting to remove it
Sep 30 15:33:04 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28876 to record non-fatal assert at utils.c:449 : Source ID 128397 was not found when attempting to remove it
Sep 30 15:33:04 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: Forked child 28877 to record non-fatal assert at utils.c:449 : Source ID 58435 was not found when attempting to remove it
Sep 30 15:33:04 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28878 to record non-fatal assert at utils.c:449 : Source ID 128398 was not found when attempting to remove it
Sep 30 15:33:11 localhost cib: [2870]: 29010 to record non-fatal assert at utils.c:449 : Source ID 128399 was not found when attempting to remove it
Sep 30 15:33:11 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: Forked child 29011 to record non-fatal assert at utils.c:449 : Source ID 58436 was not found when attempting to remove it
Sep 30 15:33:11 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 29012 to record non-fatal assert at utils.c:449 : Source ID 128400 was not found when attempting to remove it
Sep 30 15:33:14 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 29060 to record non-fatal assert at utils.c:449 : Source ID 128401 was not found when attempting to remove it
Sep 30 15:33:14 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: Forked child 29061 to record non-fatal assert at utils.c:449 : Source ID 58437 was not found when attempting to remove it
I don't understand what does it mean.
It means glib is bitching about something it didn't used to.

What version of pacemaker did you update to? I'm reasonably confident they're fixed in 1.1.12
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20141001/f3b7ead2/attachment-0001.sig>
Riccardo Bicelli
2014-10-02 14:10:39 UTC
Permalink
I'm running pacemaker-1.0.10 and glib-2.40.0-r1:2 on gentoo
Post by Andrew Beekhof
Post by Riccardo Bicelli
Hello,
Sep 30 15:32:43 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28573 to record non-fatal assert at utils.c:449 : Source ID 128394 was not found when attempting to remove it
Sep 30 15:32:55 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28753 to record non-fatal assert at utils.c:449 : Source ID 128395 was not found when attempting to remove it
Sep 30 15:32:55 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: Forked child 28756 to record non-fatal assert at utils.c:449 : Source ID 58434 was not found when attempting to remove it
Sep 30 15:32:55 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28757 to record non-fatal assert at utils.c:449 : Source ID 128396 was not found when attempting to remove it
Sep 30 15:33:04 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28876 to record non-fatal assert at utils.c:449 : Source ID 128397 was not found when attempting to remove it
Sep 30 15:33:04 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: Forked child 28877 to record non-fatal assert at utils.c:449 : Source ID 58435 was not found when attempting to remove it
Sep 30 15:33:04 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28878 to record non-fatal assert at utils.c:449 : Source ID 128398 was not found when attempting to remove it
Sep 30 15:33:11 localhost cib: [2870]: 29010 to record non-fatal assert at utils.c:449 : Source ID 128399 was not found when attempting to remove it
Sep 30 15:33:11 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: Forked child 29011 to record non-fatal assert at utils.c:449 : Source ID 58436 was not found when attempting to remove it
Sep 30 15:33:11 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 29012 to record non-fatal assert at utils.c:449 : Source ID 128400 was not found when attempting to remove it
Sep 30 15:33:14 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 29060 to record non-fatal assert at utils.c:449 : Source ID 128401 was not found when attempting to remove it
Sep 30 15:33:14 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: Forked child 29061 to record non-fatal assert at utils.c:449 : Source ID 58437 was not found when attempting to remove it
I don't understand what does it mean.
It means glib is bitching about something it didn't used to.
What version of pacemaker did you update to? I'm reasonably confident they're fixed in 1.1.12
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20141002/4295d4d0/attachment.html>
Andrew Beekhof
2014-10-02 23:06:50 UTC
Permalink
Post by Riccardo Bicelli
I'm running pacemaker-1.0.10
well and truly time to get off the 1.0.x series
Post by Riccardo Bicelli
and glib-2.40.0-r1:2 on gentoo
Post by Andrew Beekhof
Post by Riccardo Bicelli
Hello,
Sep 30 15:32:43 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28573 to record non-fatal assert at utils.c:449 : Source ID 128394 was not found when attempting to remove it
Sep 30 15:32:55 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28753 to record non-fatal assert at utils.c:449 : Source ID 128395 was not found when attempting to remove it
Sep 30 15:32:55 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: Forked child 28756 to record non-fatal assert at utils.c:449 : Source ID 58434 was not found when attempting to remove it
Sep 30 15:32:55 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28757 to record non-fatal assert at utils.c:449 : Source ID 128396 was not found when attempting to remove it
Sep 30 15:33:04 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28876 to record non-fatal assert at utils.c:449 : Source ID 128397 was not found when attempting to remove it
Sep 30 15:33:04 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: Forked child 28877 to record non-fatal assert at utils.c:449 : Source ID 58435 was not found when attempting to remove it
Sep 30 15:33:04 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 28878 to record non-fatal assert at utils.c:449 : Source ID 128398 was not found when attempting to remove it
Sep 30 15:33:11 localhost cib: [2870]: 29010 to record non-fatal assert at utils.c:449 : Source ID 128399 was not found when attempting to remove it
Sep 30 15:33:11 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: Forked child 29011 to record non-fatal assert at utils.c:449 : Source ID 58436 was not found when attempting to remove it
Sep 30 15:33:11 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 29012 to record non-fatal assert at utils.c:449 : Source ID 128400 was not found when attempting to remove it
Sep 30 15:33:14 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: Forked child 29060 to record non-fatal assert at utils.c:449 : Source ID 128401 was not found when attempting to remove it
Sep 30 15:33:14 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: Forked child 29061 to record non-fatal assert at utils.c:449 : Source ID 58437 was not found when attempting to remove it
I don't understand what does it mean.
It means glib is bitching about something it didn't used to.
What version of pacemaker did you update to? I'm reasonably confident they're fixed in 1.1.12
_______________________________________________
Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
http://www.clusterlabs.org
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20141003/9be69233/attachment.sig>
renayama19661014
2014-10-03 01:18:43 UTC
Permalink
Hi Andrew,

About a similar problem, we confirmed it in Pacemaker1.1.12.
The problem occurs in (glib2.40.0) in Ubuntu14.04.

lrmd[1632]: ? ?error: crm_abort: crm_glib_handler: Forked child 1840 to record non-fatal assert at logging.c:73 : Source ID 51 was not found when attempting to remove it
lrmd[1632]: ? ? crit: crm_glib_handler: GLib: Source ID 51 was not found when attempting to remove it


This problem does not happen in RHEL6.


The cause of the version of glib seem to be different.


When g_source_remove does timer processing to return FALSE, it becomes the error in glib2.40.0.(Probably as for the subsequent version too)

It seems to be necessary to revise Pacemaker to solve a problem.

Best Regards,
Hideo Yamauchi.


----- Original Message -----
From: Andrew Beekhof <andrew at beekhof.net>
To: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
Date: 2014/10/3, Fri 08:06
Subject: Re: [Pacemaker] Lot of errors after update
I'm running? pacemaker-1.0.10
well and truly time to get off the 1.0.x series
and? glib-2.40.0-r1:2 on gentoo
On 30 Sep 2014, at 11:36 pm, Riccardo Bicelli
<r.bicelli at gmail.com>
Post by Riccardo Bicelli
Hello,
I've just updated my cluster nodes and now I see lot of these
Source ID 128394 was not found when attempting to remove it
Source ID 128395 was not found when attempting to remove it
Source ID 58434 was not found when attempting to remove it
Source ID 128396 was not found when attempting to remove it
Source ID 128397 was not found when attempting to remove it
Source ID 58435 was not found when attempting to remove it
Source ID 128398 was not found when attempting to remove it
Post by Riccardo Bicelli
Sep 30 15:33:11 localhost cib: [2870]:? 29010 to record non-fatal
assert at utils.c:449 : Source ID 128399 was not found when attempting to remove
it
Source ID 58436 was not found when attempting to remove it
Source ID 128400 was not found when attempting to remove it
Source ID 128401 was not found when attempting to remove it
Source ID 58437 was not found when attempting to remove it
Post by Riccardo Bicelli
I don't understand what does it mean.
It means glib is bitching about something it didn't used to.
What version of pacemaker did you update to?? I'm reasonably
confident they're fixed in 1.1.12
_______________________________________________
Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
http://www.clusterlabs.org
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Andrew Beekhof
2014-10-06 01:40:03 UTC
Permalink
Post by renayama19661014
Hi Andrew,
About a similar problem, we confirmed it in Pacemaker1.1.12.
The problem occurs in (glib2.40.0) in Ubuntu14.04.
lrmd[1632]: error: crm_abort: crm_glib_handler: Forked child 1840 to record non-fatal assert at logging.c:73 : Source ID 51 was not found when attempting to remove it
lrmd[1632]: crit: crm_glib_handler: GLib: Source ID 51 was not found when attempting to remove it
stack trace of child 1840?
Post by renayama19661014
This problem does not happen in RHEL6.
The cause of the version of glib seem to be different.
When g_source_remove does timer processing to return FALSE, it becomes the error in glib2.40.0.(Probably as for the subsequent version too)
It seems to be necessary to revise Pacemaker to solve a problem.
Best Regards,
Hideo Yamauchi.
----- Original Message -----
From: Andrew Beekhof <andrew at beekhof.net>
To: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
Date: 2014/10/3, Fri 08:06
Subject: Re: [Pacemaker] Lot of errors after update
Post by Riccardo Bicelli
I'm running pacemaker-1.0.10
well and truly time to get off the 1.0.x series
Post by Riccardo Bicelli
and glib-2.40.0-r1:2 on gentoo
On 30 Sep 2014, at 11:36 pm, Riccardo Bicelli
<r.bicelli at gmail.com>
Post by Riccardo Bicelli
Post by Riccardo Bicelli
Hello,
I've just updated my cluster nodes and now I see lot of these
Source ID 128394 was not found when attempting to remove it
Source ID 128395 was not found when attempting to remove it
Source ID 58434 was not found when attempting to remove it
Source ID 128396 was not found when attempting to remove it
Source ID 128397 was not found when attempting to remove it
Source ID 58435 was not found when attempting to remove it
Source ID 128398 was not found when attempting to remove it
Post by Riccardo Bicelli
Post by Riccardo Bicelli
Sep 30 15:33:11 localhost cib: [2870]: 29010 to record non-fatal
assert at utils.c:449 : Source ID 128399 was not found when attempting to remove
it
Source ID 58436 was not found when attempting to remove it
Source ID 128400 was not found when attempting to remove it
Source ID 128401 was not found when attempting to remove it
Source ID 58437 was not found when attempting to remove it
Post by Riccardo Bicelli
Post by Riccardo Bicelli
I don't understand what does it mean.
It means glib is bitching about something it didn't used to.
What version of pacemaker did you update to? I'm reasonably
confident they're fixed in 1.1.12
Post by Riccardo Bicelli
_______________________________________________
Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
http://www.clusterlabs.org
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20141006/edaba8e7/attachment.sig>
renayama19661014
2014-10-06 02:23:42 UTC
Permalink
Hi Andrew,
lrmd[1632]:? ? error: crm_abort: crm_glib_handler: Forked child 1840 to?
record non-fatal assert at logging.c:73 : Source ID 51 was not found when?
attempting to remove it
lrmd[1632]:? ? crit: crm_glib_handler: GLib: Source ID 51 was not found?
when attempting to remove it
?
stack trace of child 1840?
No. I don't get it.

But, I have a simple method to confirm a problem of glib.
I register a problem with Bugzilla by the end of today and contact you.


Best Regards,
Hideo Yamauchi.



----- Original Message -----
From: Andrew Beekhof <andrew at beekhof.net>
To: renayama19661014 at ybb.ne.jp; The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
Date: 2014/10/6, Mon 10:40
Subject: Re: [Pacemaker] Lot of errors after update
Hi Andrew,
About a similar problem, we confirmed it in Pacemaker1.1.12.
The problem occurs in (glib2.40.0) in Ubuntu14.04.
lrmd[1632]:? ? error: crm_abort: crm_glib_handler: Forked child 1840 to
record non-fatal assert at logging.c:73 : Source ID 51 was not found when
attempting to remove it
lrmd[1632]:? ? crit: crm_glib_handler: GLib: Source ID 51 was not found
when attempting to remove it
stack trace of child 1840?
This problem does not happen in RHEL6.
The cause of the version of glib seem to be different.
When g_source_remove does timer processing to return FALSE, it becomes the
error in glib2.40.0.(Probably as for the subsequent version too)
It seems to be necessary to revise Pacemaker to solve a problem.
Best Regards,
Hideo Yamauchi.
----- Original Message -----
From: Andrew Beekhof <andrew at beekhof.net>
To: The Pacemaker cluster resource manager
<pacemaker at oss.clusterlabs.org>
Date: 2014/10/3, Fri 08:06
Subject: Re: [Pacemaker] Lot of errors after update
On 3 Oct 2014, at 12:10 am, Riccardo Bicelli
I'm running? pacemaker-1.0.10
well and truly time to get off the 1.0.x series
and? glib-2.40.0-r1:2 on gentoo
On 30 Sep 2014, at 11:36 pm, Riccardo Bicelli
<r.bicelli at gmail.com>
Post by Riccardo Bicelli
Hello,
I've just updated my cluster nodes and now I see lot of
these
crm_glib_handler: Forked child 28573 to record non-fatal assert at
Source ID 128394 was not found when attempting to remove it
crm_glib_handler: Forked child 28753 to record non-fatal assert at
Source ID 128395 was not found when attempting to remove it
crm_glib_handler: Forked child 28756 to record non-fatal assert at
Source ID 58434 was not found when attempting to remove it
crm_glib_handler: Forked child 28757 to record non-fatal assert at
Source ID 128396 was not found when attempting to remove it
crm_glib_handler: Forked child 28876 to record non-fatal assert at
Source ID 128397 was not found when attempting to remove it
crm_glib_handler: Forked child 28877 to record non-fatal assert at
Source ID 58435 was not found when attempting to remove it
crm_glib_handler: Forked child 28878 to record non-fatal assert at
Source ID 128398 was not found when attempting to remove it
Post by Riccardo Bicelli
Sep 30 15:33:11 localhost cib: [2870]:? 29010 to record
non-fatal
assert at utils.c:449 : Source ID 128399 was not found when attempting
to remove
it
crm_glib_handler: Forked child 29011 to record non-fatal assert at
Source ID 58436 was not found when attempting to remove it
crm_glib_handler: Forked child 29012 to record non-fatal assert at
Source ID 128400 was not found when attempting to remove it
crm_glib_handler: Forked child 29060 to record non-fatal assert at
Source ID 128401 was not found when attempting to remove it
crm_glib_handler: Forked child 29061 to record non-fatal assert at
Source ID 58437 was not found when attempting to remove it
Post by Riccardo Bicelli
I don't understand what does it mean.
It means glib is bitching about something it didn't used
to.
What version of pacemaker did you update to?? I'm
reasonably
confident they're fixed in 1.1.12
_______________________________________________
Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
http://www.clusterlabs.org
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Vladimir
2015-02-19 10:13:24 UTC
Permalink
Hello,

Can somebody tell the status of this issue?

Regards,
Vlad.
Post by renayama19661014
Hi Andrew,
Post by renayama19661014
Post by renayama19661014
lrmd[1632]: error: crm_abort: crm_glib_handler: Forked child 1840 to
record non-fatal assert at logging.c:73 : Source ID 51 was not found when
attempting to remove it
Post by renayama19661014
lrmd[1632]: crit: crm_glib_handler: GLib: Source ID 51 was not found
when attempting to remove it
stack trace of child 1840?
No. I don't get it.
But, I have a simple method to confirm a problem of glib.
I register a problem with Bugzilla by the end of today and contact you.
Best Regards,
Hideo Yamauchi.
----- Original Message -----
Post by renayama19661014
Date: 2014/10/6, Mon 10:40
Subject: Re: [Pacemaker] Lot of errors after update
Post by renayama19661014
Hi Andrew,
About a similar problem, we confirmed it in Pacemaker1.1.12.
The problem occurs in (glib2.40.0) in Ubuntu14.04.
lrmd[1632]: error: crm_abort: crm_glib_handler: Forked child 1840 to
record non-fatal assert at logging.c:73 : Source ID 51 was not found when
attempting to remove it
Post by renayama19661014
lrmd[1632]: crit: crm_glib_handler: GLib: Source ID 51 was not found
when attempting to remove it
stack trace of child 1840?
Post by renayama19661014
This problem does not happen in RHEL6.
The cause of the version of glib seem to be different.
When g_source_remove does timer processing to return FALSE, it becomes the
error in glib2.40.0.(Probably as for the subsequent version too)
Post by renayama19661014
It seems to be necessary to revise Pacemaker to solve a problem.
Best Regards,
Hideo Yamauchi.
----- Original Message -----
To: The Pacemaker cluster resource manager
Date: 2014/10/3, Fri 08:06
Subject: Re: [Pacemaker] Lot of errors after update
On 3 Oct 2014, at 12:10 am, Riccardo Bicelli
Post by Riccardo Bicelli
I'm running pacemaker-1.0.10
well and truly time to get off the 1.0.x series
Post by Riccardo Bicelli
and glib-2.40.0-r1:2 on gentoo
On 30 Sep 2014, at 11:36 pm, Riccardo Bicelli
Post by Riccardo Bicelli
Hello,
I've just updated my cluster nodes and now I see lot of
these
Post by renayama19661014
crm_glib_handler: Forked child 28573 to record non-fatal assert at
Source ID 128394 was not found when attempting to remove it
crm_glib_handler: Forked child 28753 to record non-fatal assert at
Source ID 128395 was not found when attempting to remove it
crm_glib_handler: Forked child 28756 to record non-fatal assert at
Source ID 58434 was not found when attempting to remove it
crm_glib_handler: Forked child 28757 to record non-fatal assert at
Source ID 128396 was not found when attempting to remove it
crm_glib_handler: Forked child 28876 to record non-fatal assert at
Source ID 128397 was not found when attempting to remove it
crm_glib_handler: Forked child 28877 to record non-fatal assert at
Source ID 58435 was not found when attempting to remove it
crm_glib_handler: Forked child 28878 to record non-fatal assert at
Source ID 128398 was not found when attempting to remove it
Post by Riccardo Bicelli
Post by Riccardo Bicelli
Sep 30 15:33:11 localhost cib: [2870]: 29010 to record
non-fatal
Post by renayama19661014
assert at utils.c:449 : Source ID 128399 was not found when attempting
to remove
Post by renayama19661014
it
crm_glib_handler: Forked child 29011 to record non-fatal assert at
Source ID 58436 was not found when attempting to remove it
crm_glib_handler: Forked child 29012 to record non-fatal assert at
Source ID 128400 was not found when attempting to remove it
crm_glib_handler: Forked child 29060 to record non-fatal assert at
Source ID 128401 was not found when attempting to remove it
crm_glib_handler: Forked child 29061 to record non-fatal assert at
Source ID 58437 was not found when attempting to remove it
Post by Riccardo Bicelli
Post by Riccardo Bicelli
I don't understand what does it mean.
It means glib is bitching about something it didn't used
to.
Post by renayama19661014
Post by Riccardo Bicelli
What version of pacemaker did you update to? I'm
reasonably
Post by renayama19661014
confident they're fixed in 1.1.12
Post by Riccardo Bicelli
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
http://www.clusterlabs.org
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Post by renayama19661014
Post by Riccardo Bicelli
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Post by renayama19661014
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Loading...