Discussion:
[Pacemaker] Booth ticket renovation timeout
Jorge Lopes
2015-02-08 19:06:13 UTC
Permalink
Hi all,

I'm performing a lab test were I have a geo cluster and an arbitrator, in a
configuration for disaster recovery with fail over. There are two main
sites (primary and disaster recovery) and a third site for arbitrator.

I have defined a ticket named "Primary", which will define which is the
primary site and which is the recovery site.
In my first configuration I had in the bothh.conf a value of 60 for the
ticket renewal. After I assigned the ticket to the primary site, when the
renovation time was reached, the ticket was not renewed and it ended up not
assigned to any of the sites.

So, I increased the value to 120 and now the ticket gets correctly renewed.

I am interested to know if there are any kind of constraints for the
minimum value for the ticket renewal. Is there any design aspect that would
recommend higher values? And what about in a production environment, where
time lags might be larger, would such a situation occur? What would be a
typical set of timeout values (please notice the CIB timeout values).

My configurations are as follow.

Thanks in advance,
Jorge


/etc/booth/booth.conf:

transport="UDP"
port="6666"
site="192.168.180.211"
site="192.168.190.211"
arbitrator="192.168.200.211"
ticket="primary;120"


crm configure show:
node $id="1084798152" cluster1-node1
primitive booth ocf:pacemaker:booth-site \
meta resource-stickiness="INFINITY" \
op monitor interval="10s" timeout="20s"
primitive booth-ip ocf:heartbeat:IPaddr2 \
params ip="192.168.180.211"
primitive dummy-pgsql ocf:pacemaker:Stateful \
op monitor interval="15" role="Slave" timeout="60s" \
op monitor interval="30" role="Master" timeout="60s"
primitive oversee-ip ocf:heartbeat:IPaddr2 \
params ip="192.168.180.210"
group g-booth booth-ip booth
ms ms_dummy_pqsql dummy-pgsql \
meta target-role="Master" clone-max="1"
order order-booth-oversee-ip inf: g-booth oversee-ip
rsc_ticket ms_dummy_pgsql_primary primary: ms_dummy_pqsql:Master
loss-policy=demote
rsc_ticket oversee-ip-req-primary primary: oversee-ip loss-policy=stop
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
stonith-enabled="false"
Dejan Muhamedagic
2015-02-09 09:56:22 UTC
Permalink
Hi,
Post by Jorge Lopes
Hi all,
I'm performing a lab test were I have a geo cluster and an arbitrator, in a
configuration for disaster recovery with fail over. There are two main
sites (primary and disaster recovery) and a third site for arbitrator.
I have defined a ticket named "Primary", which will define which is the
primary site and which is the recovery site.
In my first configuration I had in the bothh.conf a value of 60 for the
ticket renewal. After I assigned the ticket to the primary site, when the
renovation time was reached, the ticket was not renewed and it ended up not
assigned to any of the sites.
So, I increased the value to 120 and now the ticket gets correctly renewed.
I am interested to know if there are any kind of constraints for the
minimum value for the ticket renewal. Is there any design aspect that would
recommend higher values? And what about in a production environment, where
time lags might be larger, would such a situation occur? What would be a
typical set of timeout values (please notice the CIB timeout values).
My configurations are as follow.
It seems like you're running the older version of booth, which
has been deprecated and is effectively unmaintained. The newer
version is available at
https://github.com/ClusterLabs/booth/releases/tag/v0.2.0

Thanks,

Dejan
Post by Jorge Lopes
Thanks in advance,
Jorge
transport="UDP"
port="6666"
site="192.168.180.211"
site="192.168.190.211"
arbitrator="192.168.200.211"
ticket="primary;120"
node $id="1084798152" cluster1-node1
primitive booth ocf:pacemaker:booth-site \
meta resource-stickiness="INFINITY" \
op monitor interval="10s" timeout="20s"
primitive booth-ip ocf:heartbeat:IPaddr2 \
params ip="192.168.180.211"
primitive dummy-pgsql ocf:pacemaker:Stateful \
op monitor interval="15" role="Slave" timeout="60s" \
op monitor interval="30" role="Master" timeout="60s"
primitive oversee-ip ocf:heartbeat:IPaddr2 \
params ip="192.168.180.210"
group g-booth booth-ip booth
ms ms_dummy_pqsql dummy-pgsql \
meta target-role="Master" clone-max="1"
order order-booth-oversee-ip inf: g-booth oversee-ip
rsc_ticket ms_dummy_pgsql_primary primary: ms_dummy_pqsql:Master
loss-policy=demote
rsc_ticket oversee-ip-req-primary primary: oversee-ip loss-policy=stop
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
stonith-enabled="false"
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Jorge Lopes
2015-02-09 10:53:00 UTC
Permalink
Hi Dejan,
Thanks for the tip.

Concerning the timeout values, what would be the ticket renewal typical
values for a production environment?

Thanks,
Jorge
Post by Dejan Muhamedagic
Post by Jorge Lopes
Hi all,
I'm performing a lab test were I have a geo cluster and an arbitrator,
in a
Post by Dejan Muhamedagic
Post by Jorge Lopes
configuration for disaster recovery with fail over. There are two main
sites (primary and disaster recovery) and a third site for arbitrator.
I have defined a ticket named "Primary", which will define which is the
primary site and which is the recovery site.
In my first configuration I had in the bothh.conf a value of 60 for the
ticket renewal. After I assigned the ticket to the primary site, when the
renovation time was reached, the ticket was not renewed and it ended up
not
Post by Dejan Muhamedagic
Post by Jorge Lopes
assigned to any of the sites.
So, I increased the value to 120 and now the ticket gets correctly
renewed.
Post by Dejan Muhamedagic
Post by Jorge Lopes
I am interested to know if there are any kind of constraints for the
minimum value for the ticket renewal. Is there any design aspect that
would
Post by Dejan Muhamedagic
Post by Jorge Lopes
recommend higher values? And what about in a production environment,
where
Post by Dejan Muhamedagic
Post by Jorge Lopes
time lags might be larger, would such a situation occur? What would be a
typical set of timeout values (please notice the CIB timeout values).
My configurations are as follow.
It seems like you're running the older version of booth, which
has been deprecated and is effectively unmaintained. The newer
version is available at
https://github.com/ClusterLabs/booth/releases/tag/v0.2.0
Thanks,
Dejan
Post by Jorge Lopes
Thanks in advance,
Jorge
transport="UDP"
port="6666"
site="192.168.180.211"
site="192.168.190.211"
arbitrator="192.168.200.211"
ticket="primary;120"
node $id="1084798152" cluster1-node1
primitive booth ocf:pacemaker:booth-site \
meta resource-stickiness="INFINITY" \
op monitor interval="10s" timeout="20s"
primitive booth-ip ocf:heartbeat:IPaddr2 \
params ip="192.168.180.211"
primitive dummy-pgsql ocf:pacemaker:Stateful \
op monitor interval="15" role="Slave" timeout="60s" \
op monitor interval="30" role="Master" timeout="60s"
primitive oversee-ip ocf:heartbeat:IPaddr2 \
params ip="192.168.180.210"
group g-booth booth-ip booth
ms ms_dummy_pqsql dummy-pgsql \
meta target-role="Master" clone-max="1"
order order-booth-oversee-ip inf: g-booth oversee-ip
rsc_ticket ms_dummy_pgsql_primary primary: ms_dummy_pqsql:Master
loss-policy=demote
rsc_ticket oversee-ip-req-primary primary: oversee-ip loss-policy=stop
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
stonith-enabled="false"
Dejan Muhamedagic
2015-02-09 11:52:30 UTC
Permalink
Post by Jorge Lopes
Hi Dejan,
Thanks for the tip.
Concerning the timeout values, what would be the ticket renewal typical
values for a production environment?
We have two parameters: expire and renewal. The latter used to be
set to half of the former and due to a user request it is
configurable now. If not configured, the expire time is set to 10
minutes, which yields the renewal time of 5 minutes. Those being
defaults, eventually they depend on your business needs and
site failover/disaster recovery procedures as well as the
connection stability and packet loss rates. I doubt that an
expiry time of less than 1 minute is practical, though testing
could be done with times of less than 10 seconds. The README
contains booth operation description which you may find useful:

https://github.com/ClusterLabs/booth/blob/master/README

Thanks,

Dejan
Post by Jorge Lopes
Thanks,
Jorge
Post by Dejan Muhamedagic
Post by Jorge Lopes
Hi all,
I'm performing a lab test were I have a geo cluster and an arbitrator,
in a
Post by Dejan Muhamedagic
Post by Jorge Lopes
configuration for disaster recovery with fail over. There are two main
sites (primary and disaster recovery) and a third site for arbitrator.
I have defined a ticket named "Primary", which will define which is the
primary site and which is the recovery site.
In my first configuration I had in the bothh.conf a value of 60 for the
ticket renewal. After I assigned the ticket to the primary site, when the
renovation time was reached, the ticket was not renewed and it ended up
not
Post by Dejan Muhamedagic
Post by Jorge Lopes
assigned to any of the sites.
So, I increased the value to 120 and now the ticket gets correctly
renewed.
Post by Dejan Muhamedagic
Post by Jorge Lopes
I am interested to know if there are any kind of constraints for the
minimum value for the ticket renewal. Is there any design aspect that
would
Post by Dejan Muhamedagic
Post by Jorge Lopes
recommend higher values? And what about in a production environment,
where
Post by Dejan Muhamedagic
Post by Jorge Lopes
time lags might be larger, would such a situation occur? What would be a
typical set of timeout values (please notice the CIB timeout values).
My configurations are as follow.
It seems like you're running the older version of booth, which
has been deprecated and is effectively unmaintained. The newer
version is available at
https://github.com/ClusterLabs/booth/releases/tag/v0.2.0
Thanks,
Dejan
Post by Jorge Lopes
Thanks in advance,
Jorge
transport="UDP"
port="6666"
site="192.168.180.211"
site="192.168.190.211"
arbitrator="192.168.200.211"
ticket="primary;120"
node $id="1084798152" cluster1-node1
primitive booth ocf:pacemaker:booth-site \
meta resource-stickiness="INFINITY" \
op monitor interval="10s" timeout="20s"
primitive booth-ip ocf:heartbeat:IPaddr2 \
params ip="192.168.180.211"
primitive dummy-pgsql ocf:pacemaker:Stateful \
op monitor interval="15" role="Slave" timeout="60s" \
op monitor interval="30" role="Master" timeout="60s"
primitive oversee-ip ocf:heartbeat:IPaddr2 \
params ip="192.168.180.210"
group g-booth booth-ip booth
ms ms_dummy_pqsql dummy-pgsql \
meta target-role="Master" clone-max="1"
order order-booth-oversee-ip inf: g-booth oversee-ip
rsc_ticket ms_dummy_pgsql_primary primary: ms_dummy_pqsql:Master
loss-policy=demote
rsc_ticket oversee-ip-req-primary primary: oversee-ip loss-policy=stop
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
stonith-enabled="false"
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Loading...