Discussion:
[Pacemaker] concurrent uses of cibadmin: Signon to CIB failed: connection failed
Brian J. Murrell
2011-09-29 19:45:32 UTC
Permalink
So, in another thread there was a discussion of using cibadmin to
mitigate possible concurrency issue of crm shell. I have written a test
program to test that theory and unfortunately cibadmin falls down in the
face of heavy concurrency also with errors such as:

Signon to CIB failed: connection failed
Init failed, could not perform requested operations
Signon to CIB failed: connection failed
Init failed, could not perform requested operations
Signon to CIB failed: connection failed
Init failed, could not perform requested operations

Effectively my test runs:

for x in $(seq 1 50); do
cibadmin -o resources -C -x resource-$x.xml &
done

My complete test program is attached for review/experimentation if you wish.

Am I doing something wrong or is this a bug? I'm using pacemaker
1.0.10-1.4.el5 for what it's worth.

Cheers,
b.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cibadmin-test
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20110929/a92a56a9/attachment-0001.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20110929/a92a56a9/attachment-0001.sig>
Lars Ellenberg
2011-10-03 19:29:46 UTC
Permalink
Post by Brian J. Murrell
So, in another thread there was a discussion of using cibadmin to
mitigate possible concurrency issue of crm shell. I have written a test
program to test that theory and unfortunately cibadmin falls down in the
Signon to CIB failed: connection failed
Init failed, could not perform requested operations
Signon to CIB failed: connection failed
Init failed, could not perform requested operations
Signon to CIB failed: connection failed
Init failed, could not perform requested operations
Cib does a "listen(sock_fd, 10)",
implicitly, via glue, clplumbing ipcsocket.c, socket_wait_conn_new()

You get a connection request backlog of 10. Usually that is enough to
give a server enough time to accept them "in time".
If you concurrently create many new client sessions,
some client connect() may fail.

Those would then need to be retried.

My feeling is, any retry logic for concurrency issues should go in some
shell wrapper, though. If you really expect to run into too many
connect attempts to cib at the same time regularly,
"You are doing it wrong" ;-)

cibadmin seems to have consistent error codes,
this particular problem should fall into exit code 10.
Post by Brian J. Murrell
for x in $(seq 1 50); do
cibadmin -o resources -C -x resource-$x.xml &
done
My complete test program is attached for review/experimentation if you wish.
Am I doing something wrong or is this a bug? I'm using pacemaker
1.0.10-1.4.el5 for what it's worth.
Cheers,
b.
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
Loading...