Discussion:
[Pacemaker] unmanaged resource - cluster influence - ocf:heartbeat:Filesystem
Bauer, Stefan (IZLBW Extern)
2014-06-18 06:11:22 UTC
Permalink
Hello,

I'm using ocf:heartbeat:Filesystem to mount a cifs share. Additionally I enabled OCF_CHECK_LEVEL 20 to read/write from the cifs-share during monitor operation:

If I block the connection to the cifs-server with iptables, the monitor operation times out. After several tries, a restart of the resource is initiated. The resource fails to stop (another timeout) so it ends up in a INFINITY and the resource is unmanaged:

Jun 17 13:49:21 node1 lrmd: [15029]: WARN: p_cifs_pictures:monitor process (PID 18444) timed out (try 1). Killing with signal SIGTERM (15).
Jun 17 13:49:21 node1 lrmd: [15029]: WARN: operation monitor[43] on p_cifs_pictures for client 15032: pid 18444 timed out

Jun 17 13:49:21 node1 Filesystem2[18750]: INFO: Running stop for //cifs/share/pictures on /srv/cifs/pictures
Jun 17 13:49:21 node1 Filesystem2[18750]: INFO: Trying to unmount /srv/cifs/pictures
Jun 17 13:49:41 node1 lrmd: [15029]: WARN: p_cifs_pictures:stop process (PID 18750) timed out (try 1). Killing with signal SIGTERM (15).

Jun 17 13:49:41 node1 crmd: [15032]: WARN: status_from_rc: Action 5 (p_cifs_pictures_stop_0) on node1 failed (target: 0 vs. rc: -2): Error
Jun 17 13:49:41 node1 crmd: [15032]: WARN: update_failcount: Updating failcount for p_cifs_pictures on node1 after failed stop: rc=-2 (update=INFINITY, time=1403005781)
Jun 17 13:49:41 node1 pengine: [15031]: WARN: common_apply_stickiness: Forcing p_cifs_pictures away from node1 after 1000000 failures (max=1000000)

So far so bad. How can I avoid a timeout during the recover? I mean what is the read/write check all about if it leaves the resource unmanaged at the end?

I fully understand, that if the resource is not securely shut down and stonith is not active, it should be unmanaged.

Thank you.

Stefan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140618/c5994b21/attachment.html>
Dejan Muhamedagic
2014-06-18 19:27:22 UTC
Permalink
Hi,
Post by Bauer, Stefan (IZLBW Extern)
Hello,
Jun 17 13:49:21 node1 lrmd: [15029]: WARN: p_cifs_pictures:monitor process (PID 18444) timed out (try 1). Killing with signal SIGTERM (15).
Jun 17 13:49:21 node1 lrmd: [15029]: WARN: operation monitor[43] on p_cifs_pictures for client 15032: pid 18444 timed out
Jun 17 13:49:21 node1 Filesystem2[18750]: INFO: Running stop for //cifs/share/pictures on /srv/cifs/pictures
Jun 17 13:49:21 node1 Filesystem2[18750]: INFO: Trying to unmount /srv/cifs/pictures
Jun 17 13:49:41 node1 lrmd: [15029]: WARN: p_cifs_pictures:stop process (PID 18750) timed out (try 1). Killing with signal SIGTERM (15).
Jun 17 13:49:41 node1 crmd: [15032]: WARN: status_from_rc: Action 5 (p_cifs_pictures_stop_0) on node1 failed (target: 0 vs. rc: -2): Error
Jun 17 13:49:41 node1 crmd: [15032]: WARN: update_failcount: Updating failcount for p_cifs_pictures on node1 after failed stop: rc=-2 (update=INFINITY, time=1403005781)
Jun 17 13:49:41 node1 pengine: [15031]: WARN: common_apply_stickiness: Forcing p_cifs_pictures away from node1 after 1000000 failures (max=1000000)
So far so bad. How can I avoid a timeout during the recover? I mean what is the read/write check all about if it leaves the resource unmanaged at the end?
Why do you think that the one affects the other? Or is it that
when you turn the 20 level monitor off stop doesn't timeout? The
two should be unrelated.
Post by Bauer, Stefan (IZLBW Extern)
I fully understand, that if the resource is not securely shut down and stonith is not active, it should be unmanaged.
Right. Stop timeouts should really be avoided, whenever possible.
However, what is the meaning of the timeout in this case? Is
there a possibility of corruption if the filesystem cannot be
umounted in a regular way? I'm not an expert on ceph, that's why
I ask.

Thanks,

Dejan
Post by Bauer, Stefan (IZLBW Extern)
Thank you.
Stefan
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Loading...