[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#292856: kernel-image-2.6.8-2-686: unkillable process



On Mon, Jan 31, 2005 at 08:46:35PM +0000, Greg Kochanski wrote:
> Unexpectedly, it is reproducible.
> 
> Here's the relevant bit of ps -e -F .   This was taken
> a minute or so after I started the strace find .
> 
> gpk      16650 16555  0   646 1480   0 20:35 pts/7    00:00:00 bash
> gpk      16659 16650  1   428  572   0 20:36 pts/7    00:00:01 strace 
> find . -name #cvs
> gpk      16660 16659  0   382  444   0 20:36 pts/7    00:00:00 find . 
> -name #cvs
> gpk      16681 16583  0   624  852   0 20:38 pts/4    00:00:00 ps -e -F
> 
> /var/log/dmesg and /var/log/syslog show no relevant entries
> (and no entries at all since I started the find .)
> 
> 
> The directory from which I launched find
> is on a local disk; no disks are configured for NFS.
> The directory was reached via a symbolic link, though that
> ought not to be relevant.
> 
> The disk it is on is the main system disk, and it seems
> to be functioning well.
> 
> 
> The tail end of the output of strace follows:
> 
> getdents64(4, /* 113 entries */, 4096)  = 4072
> getdents64(4, /* 50 entries */, 4096)   = 1760
> getdents64(4, /* 0 entries */, 4096)    = 0
> close(4)                                = 0
> chdir("22")                             = 0
> lstat64(".", {st_mode=S_IFDIR|0755, st_size=20480, ...}) = 0
> chdir("..")                             = 0
> lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> lstat64("23", {st_mode=S_IFDIR|0755, st_size=20480, ...}) = 0
> open("23", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 4
> fstat64(4, {st_mode=S_IFDIR|0755, st_size=20480, ...}) = 0
> fcntl64(4, F_SETFD, FD_CLOEXEC)         = 0
> getdents64(4,
> 
> (The output stopped half-way through the last line.
> I had it going directly to a terminal, rather than a file
> to avoid any buffering.)

Ok, in a nutshell, what is going on here is that find is opening
a directory (getdents64() is likely the result of a call to  readdir()),
and the kernel is not returning from that call. This is most likely
because some IO is blocking (permanently) somewhere. Using the
non-straced output of find you should be able to work out aproximately
where in the filesystem this is occuring. 

You mentioned in another bug report that you are experiencing high load
average, yet the CPU seems idle. You also mentioned you have been
using a USB disk. 

I strongly suspect that this is infact the same issue.
I strongly suspect that you have a large number of (find) processes
blocked on IO somewhere in your filesystem. I would strongly
suspect this is the mountpoint where the system thinks that
the USB disk is, but it isn't there, and it is blocking, waiting to 
acccess the system.

That this is reproducable is not surprising in the least. 
Blocking IO is very commonly used, and blocking means just that,
it blocks until the result comes out. And while it is blocking,
it is usually stuck in the kernel, and you can't kill process that
are stuck in the kernel.

-- 
Horms



Reply to: