Discussion:
[Pacemaker] Filesystem resource killing innocent processes on stop
Nikola Ciprich
2015-05-18 10:20:38 UTC
Permalink
Hi,

I noticed very annoying bug (or so I think), that resource-agents-3.9.5
in RHEL / centos 6 Filesystem OCF resource seems to be killing completely
unrelated processes on shutdown although they're not using anything on mounted filesystem...

unfortunately, one of processes very often killed is sshd :-(

here's example of the log:

Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ? S<s 0:01 /sbin/udevd -d
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2 Ss+ 0:00 agetty ttyS2 115200 vt100
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4271 4395 0 21:58 ? Ss 0:00 sshd: ***@pts/12
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ? Rs 0:00 [bash]
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ? Ss 0:03 /usr/sbin/sshd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ? Ss 0:00 /sbin/portreserve
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ? S 0:00 supervising syslog-ng
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ? Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ? Ss 0:05 rpcbind
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ? Ss 0:00 rpc.statd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ? Ss 0:00 rpc.idmapd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ? S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop

while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be
Filesystem resource itself..

before I dig deeper into this, did anyone else noticed this problem? Is this some known
(and possibly already issue)?

thanks a lot in advance

nik
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ***@linuxbox.cz
-------------------------------------
emmanuel segura
2015-05-18 12:26:22 UTC
Permalink
are you sure you process are not in working directory /home/cluster/virt ?

I'm using suse 11 Sp2 and I don't know if the agent is the same in
redhat 6, but i think so, anyway for umounting the fs the script uses
the following functions Filesystem_stop -> fs_stop -> signal_processes

In the fs_stop function, the cluster try to kill the process that
using the fs with TERM signal

fs_stop() {
local SUB=$1 timeout=$2 sig cnt
for sig in TERM KILL; do
cnt=$((timeout/2)) # try half time with TERM
while [ $cnt -gt 0 ]; do
try_umount $SUB &&
return $OCF_SUCCESS
ocf_log err "Couldn't unmount $SUB; trying
cleanup with $sig"
signal_processes $SUB $sig
cnt=$((cnt-1))
sleep 1
done
done
return $OCF_ERR_GENERIC
}

In function signal_processes, the cluster uses fuser to kill the process

signal_processes() {
local dir=$1
local sig=$2
# fuser returns a non-zero return code if none of the
# specified files is accessed or in case of a fatal
# error.
if [ "X${HOSTOS}" = "XOpenBSD" ];then
PIDS=`fstat | grep $dir | awk '{print $3}'`
for PID in ${PIDS};do
kill -s $sig ${PID}
ocf_log info "Sent signal $sig to ${PID}"
done
else
if $FUSER -$sig -m -k $dir ; then
ocf_log info "Some processes on $dir were signalled"
else
ocf_log info "No processes on $dir were signalled"
fi
fi
}
Post by Nikola Ciprich
Hi,
I noticed very annoying bug (or so I think), that resource-agents-3.9.5
in RHEL / centos 6 Filesystem OCF resource seems to be killing completely
unrelated processes on shutdown although they're not using anything on mounted filesystem...
unfortunately, one of processes very often killed is sshd :-(
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ? S<s 0:01 /sbin/udevd -d
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2 Ss+ 0:00 agetty ttyS2 115200 vt100
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ? Rs 0:00 [bash]
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ? Ss 0:03 /usr/sbin/sshd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ? Ss 0:00 /sbin/portreserve
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ? S 0:00 supervising syslog-ng
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ? Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ? Ss 0:05 rpcbind
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ? Ss 0:00 rpc.statd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ? Ss 0:00 rpc.idmapd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ? S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop
while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be
Filesystem resource itself..
before I dig deeper into this, did anyone else noticed this problem? Is this some known
(and possibly already issue)?
thanks a lot in advance
nik
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava
tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
-------------------------------------
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
--
.~.
/V\
// \\
/( )\
^`~'^

_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Dejan Muhamedagic
2015-05-18 14:34:24 UTC
Permalink
Hi,
Post by Nikola Ciprich
Hi,
I noticed very annoying bug (or so I think), that resource-agents-3.9.5
in RHEL / centos 6 Filesystem OCF resource seems to be killing completely
unrelated processes on shutdown although they're not using anything on mounted filesystem...
unfortunately, one of processes very often killed is sshd :-(
The list below seems too extensive. Which version of
resource-agents do you run?

$ grep 'Build version:' /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs
Post by Nikola Ciprich
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ? S<s 0:01 /sbin/udevd -d
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2 Ss+ 0:00 agetty ttyS2 115200 vt100
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ? Rs 0:00 [bash]
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ? Ss 0:03 /usr/sbin/sshd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ? Ss 0:00 /sbin/portreserve
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ? S 0:00 supervising syslog-ng
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ? Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ? Ss 0:05 rpcbind
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ? Ss 0:00 rpc.statd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ? Ss 0:00 rpc.idmapd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ? S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop
while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be
Filesystem resource itself..
Hmm, that's quite strange. That implies that the RA script itself
had /home/cluster/virt as its WD.
Post by Nikola Ciprich
before I dig deeper into this, did anyone else noticed this problem? Is this some known
(and possibly already issue)?
Never heard of this.

Thanks,

Dejan
Post by Nikola Ciprich
thanks a lot in advance
nik
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava
tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
-------------------------------------
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Nikola Ciprich
2015-05-18 15:14:14 UTC
Permalink
Hi Dejan,
Post by Dejan Muhamedagic
The list below seems too extensive. Which version of
resource-agents do you run?
$ grep 'Build version:' /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs
yes, it's definitely wrong..

here's the info you've requested:

# Build version: 5434e9646462d2c3c8f7aad2609d0ef1875839c7

rpm version: resource-agents-3.9.5-12.el6_6.5.x86_64

I can already see the problem, this version simply uses
fuser -m $MOUNTPOINT which seems to return pretty wrong results:

[***@denovav1b ~]# fuser -m /home/cluster/virt/
/home/cluster/virt/: 1m 3295m 3314m 4817m 4846m 4847m 4890m 4891m 4916m 4944m 4952m 4999m 5007m 5037m 5069m 5137m 5162m 5164m 5166m 5168m 5170m 5172m 5575m 8055m 9604m 9605m 10984m 11186m 11370m 11813m 11871m 11887m 11946m 12020m 12026m 12027m 12028m 12029m 12030m 12031m 14218m 15294m 15374m 15396m 15399m 17479m 17693m 17694m 20705m 20718m 20948m 20982m 23902m 24572m 24580m 26300m 29790m 29792m 30785m

(notice even process # 1!) while lsof returns:

lsof | grep "cluster.*virt"
qemu-syst 8055 root 21r REG 0,0 232783872 1099511634304 /home/cluster/virt/images/debian-7.8.0-amd64-netinst.iso

which seems much saner to me..

BR

nik
Post by Dejan Muhamedagic
Post by Nikola Ciprich
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ? S<s 0:01 /sbin/udevd -d
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2 Ss+ 0:00 agetty ttyS2 115200 vt100
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ? Rs 0:00 [bash]
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ? Ss 0:03 /usr/sbin/sshd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ? Ss 0:00 /sbin/portreserve
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ? S 0:00 supervising syslog-ng
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ? Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ? Ss 0:05 rpcbind
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ? Ss 0:00 rpc.statd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ? Ss 0:00 rpc.idmapd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ? S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop
while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be
Filesystem resource itself..
Hmm, that's quite strange. That implies that the RA script itself
had /home/cluster/virt as its WD.
Post by Nikola Ciprich
before I dig deeper into this, did anyone else noticed this problem? Is this some known
(and possibly already issue)?
Never heard of this.
Thanks,
Dejan
Post by Nikola Ciprich
thanks a lot in advance
nik
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava
tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
-------------------------------------
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ***@linuxbox.cz
-------------------------------------
Dejan Muhamedagic
2015-05-19 08:44:44 UTC
Permalink
Post by Nikola Ciprich
Hi Dejan,
Post by Dejan Muhamedagic
The list below seems too extensive. Which version of
resource-agents do you run?
$ grep 'Build version:' /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs
yes, it's definitely wrong..
# Build version: 5434e9646462d2c3c8f7aad2609d0ef1875839c7
rpm version: resource-agents-3.9.5-12.el6_6.5.x86_64
I can already see the problem, this version simply uses
/home/cluster/virt/: 1m 3295m 3314m 4817m 4846m 4847m 4890m 4891m 4916m 4944m 4952m 4999m 5007m 5037m 5069m 5137m 5162m 5164m 5166m 5168m 5170m 5172m 5575m 8055m 9604m 9605m 10984m 11186m 11370m 11813m 11871m 11887m 11946m 12020m 12026m 12027m 12028m 12029m 12030m 12031m 14218m 15294m 15374m 15396m 15399m 17479m 17693m 17694m 20705m 20718m 20948m 20982m 23902m 24572m 24580m 26300m 29790m 29792m 30785m
lsof | grep "cluster.*virt"
qemu-syst 8055 root 21r REG 0,0 232783872 1099511634304 /home/cluster/virt/images/debian-7.8.0-amd64-netinst.iso
which seems much saner to me..
Indeed. Is fuser broken or is there some kernel side confusion?
Did you also try:

lsof /home/cluster/virt/

Anyway, it would be good to bring this up with the centos people.

Thanks,

Dejan
Post by Nikola Ciprich
BR
nik
Post by Dejan Muhamedagic
Post by Nikola Ciprich
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ? S<s 0:01 /sbin/udevd -d
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2 Ss+ 0:00 agetty ttyS2 115200 vt100
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ? Rs 0:00 [bash]
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ? Ss 0:03 /usr/sbin/sshd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ? Ss 0:00 /sbin/portreserve
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ? S 0:00 supervising syslog-ng
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ? Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ? Ss 0:05 rpcbind
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ? Ss 0:00 rpc.statd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ? Ss 0:00 rpc.idmapd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ? S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop
while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be
Filesystem resource itself..
Hmm, that's quite strange. That implies that the RA script itself
had /home/cluster/virt as its WD.
Post by Nikola Ciprich
before I dig deeper into this, did anyone else noticed this problem? Is this some known
(and possibly already issue)?
Never heard of this.
Thanks,
Dejan
Post by Nikola Ciprich
thanks a lot in advance
nik
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava
tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
-------------------------------------
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava
tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
-------------------------------------
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Vladislav Bogdanov
2015-05-19 09:33:05 UTC
Permalink
Post by Dejan Muhamedagic
Post by Nikola Ciprich
Hi Dejan,
Post by Dejan Muhamedagic
The list below seems too extensive. Which version of
resource-agents do you run?
$ grep 'Build version:' /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs
yes, it's definitely wrong..
# Build version: 5434e9646462d2c3c8f7aad2609d0ef1875839c7
rpm version: resource-agents-3.9.5-12.el6_6.5.x86_64
I can already see the problem, this version simply uses
/home/cluster/virt/: 1m 3295m 3314m 4817m 4846m 4847m 4890m 4891m 4916m 4944m 4952m 4999m 5007m 5037m 5069m 5137m 5162m 5164m 5166m 5168m 5170m 5172m 5575m 8055m 9604m 9605m 10984m 11186m 11370m 11813m 11871m 11887m 11946m 12020m 12026m 12027m 12028m 12029m 12030m 12031m 14218m 15294m 15374m 15396m 15399m 17479m 17693m 17694m 20705m 20718m 20948m 20982m 23902m 24572m 24580m 26300m 29790m 29792m 30785m
lsof | grep "cluster.*virt"
qemu-syst 8055 root 21r REG 0,0 232783872 1099511634304 /home/cluster/virt/images/debian-7.8.0-amd64-netinst.iso
which seems much saner to me..
Indeed. Is fuser broken or is there some kernel side confusion?
As far as was able to investigate, that comes from the fact that fuser
uses "device" field which is the same for source and bind mount (yes,
that is centos6).
Post by Dejan Muhamedagic
lsof /home/cluster/virt/
Anyway, it would be good to bring this up with the centos people.
Thanks,
Dejan
Post by Nikola Ciprich
BR
nik
Post by Dejan Muhamedagic
Post by Nikola Ciprich
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ? S<s 0:01 /sbin/udevd -d
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2 Ss+ 0:00 agetty ttyS2 115200 vt100
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ? Rs 0:00 [bash]
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ? Ss 0:03 /usr/sbin/sshd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ? Ss 0:00 /sbin/portreserve
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ? S 0:00 supervising syslog-ng
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ? Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ? Ss 0:05 rpcbind
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ? Ss 0:00 rpc.statd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ? Ss 0:00 rpc.idmapd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ? S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop
while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be
Filesystem resource itself..
Hmm, that's quite strange. That implies that the RA script itself
had /home/cluster/virt as its WD.
Post by Nikola Ciprich
before I dig deeper into this, did anyone else noticed this problem? Is this some known
(and possibly already issue)?
Never heard of this.
Thanks,
Dejan
Post by Nikola Ciprich
thanks a lot in advance
nik
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava
tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
-------------------------------------
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava
tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
-------------------------------------
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Vladislav Bogdanov
2015-05-18 15:19:12 UTC
Permalink
Post by Nikola Ciprich
Hi,
I noticed very annoying bug (or so I think), that resource-agents-3.9.5
in RHEL / centos 6 Filesystem OCF resource seems to be killing completely
unrelated processes on shutdown although they're not using anything on mounted filesystem...
Isn't that a bind-mount?
Post by Nikola Ciprich
unfortunately, one of processes very often killed is sshd :-(
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ? S<s 0:01 /sbin/udevd -d
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2 Ss+ 0:00 agetty ttyS2 115200 vt100
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ? Rs 0:00 [bash]
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ? Ss 0:03 /usr/sbin/sshd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ? Ss 0:00 /sbin/portreserve
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ? S 0:00 supervising syslog-ng
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ? Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ? Ss 0:05 rpcbind
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ? Ss 0:00 rpc.statd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ? Ss 0:00 rpc.idmapd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ? S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop
while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be
Filesystem resource itself..
before I dig deeper into this, did anyone else noticed this problem? Is this some known
(and possibly already issue)?
thanks a lot in advance
nik
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Nikola Ciprich
2015-05-18 15:57:23 UTC
Permalink
Hi Vladislav,
Post by Vladislav Bogdanov
Isn't that a bind-mount?
nope, but your question lead me to possible culprit..
it's cephfs mount, when I try to some local filesystem, I don't
see this weird fuser behaviour..

so maybe fuser does not work correctly on cephfs?

this is how fs is mounted:

10.0.0.1,10.0.0.2,10.0.0.3:/ on /home/cluster/virt type ceph (name=admin,key=client.admin)

I should probably ask in ceph maillist..

n.
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ***@linuxbox.cz
-------------------------------------
Vladislav Bogdanov
2015-05-18 16:34:38 UTC
Permalink
Post by Nikola Ciprich
Hi Vladislav,
Post by Vladislav Bogdanov
Isn't that a bind-mount?
nope, but your question lead me to possible culprit..
it's cephfs mount, when I try to some local filesystem, I don't
see this weird fuser behaviour..
so maybe fuser does not work correctly on cephfs?
yep, for bind-mounts fuser shows/kills both processes which use bounded
tree and original filesystem.
Post by Nikola Ciprich
10.0.0.1,10.0.0.2,10.0.0.3:/ on /home/cluster/virt type ceph (name=admin,key=client.admin)
I should probably ask in ceph maillist..
There are alternative ways to determine mountpoint usage btw.
One of them is lsof, I use it for bind-mounts.
Post by Nikola Ciprich
n.
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Dejan Muhamedagic
2015-05-19 08:46:43 UTC
Permalink
Post by Vladislav Bogdanov
Post by Nikola Ciprich
Hi Vladislav,
Post by Vladislav Bogdanov
Isn't that a bind-mount?
nope, but your question lead me to possible culprit..
it's cephfs mount, when I try to some local filesystem, I don't
see this weird fuser behaviour..
so maybe fuser does not work correctly on cephfs?
yep, for bind-mounts fuser shows/kills both processes which use
bounded tree and original filesystem.
Post by Nikola Ciprich
10.0.0.1,10.0.0.2,10.0.0.3:/ on /home/cluster/virt type ceph (name=admin,key=client.admin)
I should probably ask in ceph maillist..
There are alternative ways to determine mountpoint usage btw.
One of them is lsof, I use it for bind-mounts.
The Filesystem RA supports bind mounts. Is there a problem then
with it using fuser?

Thanks,

Dejan
Post by Vladislav Bogdanov
Post by Nikola Ciprich
n.
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Vladislav Bogdanov
2015-05-19 09:30:37 UTC
Permalink
Post by Dejan Muhamedagic
Post by Vladislav Bogdanov
Post by Nikola Ciprich
Hi Vladislav,
Post by Vladislav Bogdanov
Isn't that a bind-mount?
nope, but your question lead me to possible culprit..
it's cephfs mount, when I try to some local filesystem, I don't
see this weird fuser behaviour..
so maybe fuser does not work correctly on cephfs?
yep, for bind-mounts fuser shows/kills both processes which use
bounded tree and original filesystem.
Post by Nikola Ciprich
10.0.0.1,10.0.0.2,10.0.0.3:/ on /home/cluster/virt type ceph (name=admin,key=client.admin)
I should probably ask in ceph maillist..
There are alternative ways to determine mountpoint usage btw.
One of them is lsof, I use it for bind-mounts.
The Filesystem RA supports bind mounts. Is there a problem then
with it using fuser?
Definitely (but may be kernel/fuser version specific).
Post by Dejan Muhamedagic
Thanks,
Dejan
Post by Vladislav Bogdanov
Post by Nikola Ciprich
n.
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: ***@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Angie T. Muhammad
2015-05-19 18:59:01 UTC
Permalink
Hi Nikola,

I wish I could help, but I am not using Pacemaker for 3 years now, sorry. I
just wanted to thank you for the E-mail subject, it drew a big smile on my
face after a long tiresome not-so-good day. Really, thank you :)

Best regards,
Angie Tawfik
Post by Nikola Ciprich
Hi,
I noticed very annoying bug (or so I think), that resource-agents-3.9.5
in RHEL / centos 6 Filesystem OCF resource seems to be killing completely
unrelated processes on shutdown although they're not using anything on
mounted filesystem...
unfortunately, one of processes very often killed is sshd :-(
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal
TERM to: root 3606 1 0 Feb12 ? S<s 0:01 /sbin/udevd -d
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal
TERM to: root 4249 1 0 Feb12 ttyS2 Ss+ 0:00 agetty ttyS2
115200 vt100
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal
/12
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal
TERM to: root 4273 1 0 21:58 ? Rs 0:00 [bash]
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal
TERM to: root 4395 1 0 Feb24 ? Ss 0:03 /usr/sbin/sshd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal
TERM to: root 4677 1 0 Feb12 ? Ss 0:00
/sbin/portreserve
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal
TERM to: root 4690 1 0 Feb12 ? S 0:00 supervising
syslog-ng
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal
TERM to: root 4691 1 0 Feb12 ? Ss 0:46 syslog-ng -p
/var/run/syslog-ng.pid
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal
TERM to: rpc 4746 1 0 Feb12 ? Ss 0:05 rpcbind
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal
TERM to: rpcuser 4764 1 0 Feb12 ? Ss 0:00 rpc.statd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal
TERM to: root 4797 1 0 Feb12 ? Ss 0:00 rpc.idmapd
Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal
TERM to: root 4803 12028 0 21:59 ? S 0:00 /bin/sh
/usr/lib/ocf/resource.d/heartbeat/Filesystem stop
while unmounting /home/cluster/virt directory.. what is quite curious, is,
that last killed process seems to be
Filesystem resource itself..
before I dig deeper into this, did anyone else noticed this problem? Is this some known
(and possibly already issue)?
thanks a lot in advance
nik
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava
tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
-------------------------------------
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Loading...