Carlos Xavier
2015-05-14 17:55:42 UTC
Hi.
I are doing some testes with OCFS2 running on a AOE shared disk.
The tests are going on a OpenSuse 12.3 with the following packages:
ocfs2-tools-o2cb-1.8.2-4.8.1.x86_64
ocfs2-tools-1.8.2-4.8.1.x86_64
corosync-1.4.3-4.1.1.x86_64
libcorosync4-1.4.3-4.1.1.x86_64
libopenais3-1.1.4-15.1.1.x86_64
openais-1.1.4-15.1.1.x86_64
aoetools-35-3.1.x86_64
Thinking about the availability of the device, I created a ping resource with the aim to get the filesystem mounted only after the
disk provider is available.
This is my test configuration and it works fine if I start the openais by hand after the system is up with all modules loaded.
node cluster-1
node cluster-2
primitive p_ping ocf:pacemaker:ping \
params name="p_ping" host_list="172.31.0.199" multiplier="1000" debug="true" \
op start interval="0" timeout="60" \
op monitor interval="10s" timeout="60"
primitive resDLM ocf:pacemaker:controld \
op monitor interval="120s"
primitive resFS_BACKUP ocf:heartbeat:Filesystem \
params device="/dev/etherd/e4.1p1" directory="/backup" fstype="ocfs2" options="rw,noatime" \
op monitor interval="120s"
primitive resO2CB ocf:ocfs2:o2cb \
op monitor interval="120s"
clone cl_ping p_ping \
meta target-role="Started"
clone cloneDLM resDLM \
meta globally-unique="false" interleave="true" target-role="Started"
clone cloneFS_BACKUP resFS_BACKUP \
meta interleave="true" ordered="true" target-role="Started"
clone cloneO2CB resO2CB \
meta globally-unique="false" interleave="true" target-role="Started"
colocation colFS_BACKUP-PING inf: cloneFS_BACKUP cl_ping
colocation colO2CBDLM inf: cloneO2CB cloneDLM
colocation colPING-O2CB inf: cl_ping cloneO2CB
order ordDLMO2CB 0: cloneDLM cloneO2CB
order ordO2CB-PING 0: cloneO2CB cl_ping
order ordPING-FS_BACKUP 0: cl_ping cloneFS_BACKUP
property $id="cib-bootstrap-options" \
dc-version="1.1.7-61a079313275f3e9d0e85671f62c721d32ce3563" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1431606872"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
#vim:set syntax=pcmk
However, if I let the machine reboot and start the openais during the boot process it wont work because the AOE disk is inserted on
the system way after the Pacemaker cluster get started and the ping resource is not holding the start process
2015-05-14T06:52:31.704941-03:00 cluster-2 lrmd: [1272]: info: rsc:resDLM:1 probe[2] (pid 1363)
2015-05-14T06:52:31.706465-03:00 cluster-2 lrmd: [1272]: info: rsc:resO2CB:1 probe[3] (pid 1364)
2015-05-14T06:52:31.708459-03:00 cluster-2 lrmd: [1272]: info: rsc:p_ping:1 probe[4] (pid 1365)
2015-05-14T06:52:31.710244-03:00 cluster-2 lrmd: [1272]: info: rsc:resFS_BACKUP:1 probe[5] (pid 1366)
2015-05-14T06:52:31.976245-03:00 cluster-2 lrmd: [1272]: info: operation monitor[4] on p_ping:1 for client 1275: pid 1
365 exited with return code 7
2015-05-14T06:52:31.998091-03:00 cluster-2 lrmd: [1272]: info: operation monitor[2] on resDLM:1 for client 1275: pid 1
363 exited with return code 7
2015-05-14T06:52:31.998122-03:00 cluster-2 crmd: [1275]: info: process_lrm_event: LRM operation p_ping:1_monitor_0 (ca
ll=4, rc=7, cib-update=8, confirmed=true) not running
2015-05-14T06:52:31.998129-03:00 cluster-2 lrmd: [1272]: info: RA output: (resDLM:1:monitor:stderr) dlm_controld.pcmk:
no process found
2015-05-14T06:52:31.999380-03:00 cluster-2 o2cb(resO2CB:1)[1364]: [1400]: INFO: configfs not laoded
2015-05-14T06:52:32.000362-03:00 cluster-2 Filesystem(resFS_BACKUP:1)[1366]: [1401]: WARNING: Couldn't find device [/d
ev/etherd/e4.1p1]. Expected /dev/??? to exist
2015-05-14T06:52:32.020263-03:00 cluster-2 lrmd: [1272]: info: operation monitor[3] on resO2CB:1 for client 1275: pid
1364 exited with return code 7
2015-05-14T06:52:32.020297-03:00 cluster-2 crmd: [1275]: info: process_lrm_event: LRM operation resDLM:1_monitor_0 (ca
ll=2, rc=7, cib-update=9, confirmed=true) not running
2015-05-14T06:52:32.040191-03:00 cluster-2 lrmd: [1272]: info: operation monitor[5] on resFS_BACKUP:1 for client 1275:
pid 1366 exited with return code 7
2015-05-14T06:52:32.041092-03:00 cluster-2 crmd: [1275]: info: process_lrm_event: LRM operation resO2CB:1_monitor_0 (c
all=3, rc=7, cib-update=10, confirmed=true) not running
2015-05-14T06:52:32.065341-03:00 cluster-2 crmd: [1275]: info: process_lrm_event: LRM operation resFS_BACKUP:1_monitor
_0 (call=5, rc=7, cib-update=11, confirmed=true) not running
2015-05-14T06:52:32.069975-03:00 cluster-2 attrd: [1273]: notice: attrd_trigger_update: Sending flush op to all hosts
for: probe_complete (true)
2015-05-14T06:52:32.071782-03:00 cluster-2 lrmd: [1272]: info: rsc:resDLM:1 start[6] (pid 1455)
2015-05-14T06:52:32.093820-03:00 cluster-2 lrmd: [1272]: info: RA output: (resDLM:1:start:stderr) dlm_controld.pcmk: n
o process found
2015-05-14T06:52:32.118738-03:00 cluster-2 systemd[1]: Mounting Configuration File System...
2015-05-14T06:52:32.124161-03:00 cluster-2 systemd[1]: Mounted Configuration File System.
2015-05-14T06:52:32.125353-03:00 cluster-2 mount[1468]: mount: configfs is already mounted or /sys/kernel/config busy
2015-05-14T06:52:32.126138-03:00 cluster-2 systemd[1]: sys-kernel-config.mount mount process exited, code=exited statu
s=32
.
.
.
2015-05-14T08:54:26.319506-03:00 cluster-2 systemd-logind[509]: New session 1 of user root.
2015-05-14T08:54:28.494462-03:00 cluster-2 kernel: [ 65.504178] aoe: e4.1: setting 1024 byte data frames
2015-05-14T08:54:28.494489-03:00 cluster-2 kernel: [ 65.504359] aoe: 5cd998b17867 e4.1 v400f has 1953525168 sectors
2015-05-14T08:54:28.495400-03:00 cluster-2 kernel: [ 65.505138] etherd/e4.1: p1
So the question is, how can I make the Pacemaker wait for a device to be ready before trying to use it?
Regards,
Carlos.
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
I are doing some testes with OCFS2 running on a AOE shared disk.
The tests are going on a OpenSuse 12.3 with the following packages:
ocfs2-tools-o2cb-1.8.2-4.8.1.x86_64
ocfs2-tools-1.8.2-4.8.1.x86_64
corosync-1.4.3-4.1.1.x86_64
libcorosync4-1.4.3-4.1.1.x86_64
libopenais3-1.1.4-15.1.1.x86_64
openais-1.1.4-15.1.1.x86_64
aoetools-35-3.1.x86_64
Thinking about the availability of the device, I created a ping resource with the aim to get the filesystem mounted only after the
disk provider is available.
This is my test configuration and it works fine if I start the openais by hand after the system is up with all modules loaded.
node cluster-1
node cluster-2
primitive p_ping ocf:pacemaker:ping \
params name="p_ping" host_list="172.31.0.199" multiplier="1000" debug="true" \
op start interval="0" timeout="60" \
op monitor interval="10s" timeout="60"
primitive resDLM ocf:pacemaker:controld \
op monitor interval="120s"
primitive resFS_BACKUP ocf:heartbeat:Filesystem \
params device="/dev/etherd/e4.1p1" directory="/backup" fstype="ocfs2" options="rw,noatime" \
op monitor interval="120s"
primitive resO2CB ocf:ocfs2:o2cb \
op monitor interval="120s"
clone cl_ping p_ping \
meta target-role="Started"
clone cloneDLM resDLM \
meta globally-unique="false" interleave="true" target-role="Started"
clone cloneFS_BACKUP resFS_BACKUP \
meta interleave="true" ordered="true" target-role="Started"
clone cloneO2CB resO2CB \
meta globally-unique="false" interleave="true" target-role="Started"
colocation colFS_BACKUP-PING inf: cloneFS_BACKUP cl_ping
colocation colO2CBDLM inf: cloneO2CB cloneDLM
colocation colPING-O2CB inf: cl_ping cloneO2CB
order ordDLMO2CB 0: cloneDLM cloneO2CB
order ordO2CB-PING 0: cloneO2CB cl_ping
order ordPING-FS_BACKUP 0: cl_ping cloneFS_BACKUP
property $id="cib-bootstrap-options" \
dc-version="1.1.7-61a079313275f3e9d0e85671f62c721d32ce3563" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1431606872"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
#vim:set syntax=pcmk
However, if I let the machine reboot and start the openais during the boot process it wont work because the AOE disk is inserted on
the system way after the Pacemaker cluster get started and the ping resource is not holding the start process
2015-05-14T06:52:31.704941-03:00 cluster-2 lrmd: [1272]: info: rsc:resDLM:1 probe[2] (pid 1363)
2015-05-14T06:52:31.706465-03:00 cluster-2 lrmd: [1272]: info: rsc:resO2CB:1 probe[3] (pid 1364)
2015-05-14T06:52:31.708459-03:00 cluster-2 lrmd: [1272]: info: rsc:p_ping:1 probe[4] (pid 1365)
2015-05-14T06:52:31.710244-03:00 cluster-2 lrmd: [1272]: info: rsc:resFS_BACKUP:1 probe[5] (pid 1366)
2015-05-14T06:52:31.976245-03:00 cluster-2 lrmd: [1272]: info: operation monitor[4] on p_ping:1 for client 1275: pid 1
365 exited with return code 7
2015-05-14T06:52:31.998091-03:00 cluster-2 lrmd: [1272]: info: operation monitor[2] on resDLM:1 for client 1275: pid 1
363 exited with return code 7
2015-05-14T06:52:31.998122-03:00 cluster-2 crmd: [1275]: info: process_lrm_event: LRM operation p_ping:1_monitor_0 (ca
ll=4, rc=7, cib-update=8, confirmed=true) not running
2015-05-14T06:52:31.998129-03:00 cluster-2 lrmd: [1272]: info: RA output: (resDLM:1:monitor:stderr) dlm_controld.pcmk:
no process found
2015-05-14T06:52:31.999380-03:00 cluster-2 o2cb(resO2CB:1)[1364]: [1400]: INFO: configfs not laoded
2015-05-14T06:52:32.000362-03:00 cluster-2 Filesystem(resFS_BACKUP:1)[1366]: [1401]: WARNING: Couldn't find device [/d
ev/etherd/e4.1p1]. Expected /dev/??? to exist
2015-05-14T06:52:32.020263-03:00 cluster-2 lrmd: [1272]: info: operation monitor[3] on resO2CB:1 for client 1275: pid
1364 exited with return code 7
2015-05-14T06:52:32.020297-03:00 cluster-2 crmd: [1275]: info: process_lrm_event: LRM operation resDLM:1_monitor_0 (ca
ll=2, rc=7, cib-update=9, confirmed=true) not running
2015-05-14T06:52:32.040191-03:00 cluster-2 lrmd: [1272]: info: operation monitor[5] on resFS_BACKUP:1 for client 1275:
pid 1366 exited with return code 7
2015-05-14T06:52:32.041092-03:00 cluster-2 crmd: [1275]: info: process_lrm_event: LRM operation resO2CB:1_monitor_0 (c
all=3, rc=7, cib-update=10, confirmed=true) not running
2015-05-14T06:52:32.065341-03:00 cluster-2 crmd: [1275]: info: process_lrm_event: LRM operation resFS_BACKUP:1_monitor
_0 (call=5, rc=7, cib-update=11, confirmed=true) not running
2015-05-14T06:52:32.069975-03:00 cluster-2 attrd: [1273]: notice: attrd_trigger_update: Sending flush op to all hosts
for: probe_complete (true)
2015-05-14T06:52:32.071782-03:00 cluster-2 lrmd: [1272]: info: rsc:resDLM:1 start[6] (pid 1455)
2015-05-14T06:52:32.093820-03:00 cluster-2 lrmd: [1272]: info: RA output: (resDLM:1:start:stderr) dlm_controld.pcmk: n
o process found
2015-05-14T06:52:32.118738-03:00 cluster-2 systemd[1]: Mounting Configuration File System...
2015-05-14T06:52:32.124161-03:00 cluster-2 systemd[1]: Mounted Configuration File System.
2015-05-14T06:52:32.125353-03:00 cluster-2 mount[1468]: mount: configfs is already mounted or /sys/kernel/config busy
2015-05-14T06:52:32.126138-03:00 cluster-2 systemd[1]: sys-kernel-config.mount mount process exited, code=exited statu
s=32
.
.
.
2015-05-14T08:54:26.319506-03:00 cluster-2 systemd-logind[509]: New session 1 of user root.
2015-05-14T08:54:28.494462-03:00 cluster-2 kernel: [ 65.504178] aoe: e4.1: setting 1024 byte data frames
2015-05-14T08:54:28.494489-03:00 cluster-2 kernel: [ 65.504359] aoe: 5cd998b17867 e4.1 v400f has 1953525168 sectors
2015-05-14T08:54:28.495400-03:00 cluster-2 kernel: [ 65.505138] etherd/e4.1: p1
So the question is, how can I make the Pacemaker wait for a device to be ready before trying to use it?
Regards,
Carlos.
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org