[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: VM global patch / fixing blockdev deadlocks in Kernel 2.4.5?



On Thu, Jun 07, 2001 at 10:27:49PM +0200, Klaus Knopper wrote:
> Hello Andrea (hope I found the correct address),
> 
> I'm working on a Linux distribution running entirely from CD, using the
> (de)compressed loopback device for which I'm co-author, which is
> basically the Kernel 2.2 loop.c with a gzip-like decompressor built in. I
> successfully ported this to 2.4.5 without any problems, at first.
> 
> Back when using Kernel 2.2.18, I came across a strange effect when 
> doing some parallelized IO on the cloop-mounted device. The IO suddenly
> stops without any kernel panic or other error message, furthermore,
> not only the cloop device hangs but all other mounted blick devices as
> well. I never found out what exactly happened, but spend a lot of time
> of tracing and rewriting plus speed-improving cloop.o without finding
> any obvious error. The location where it hang was definitely in
> ll_rw_blk(), but I lost trace from when schedule() was called.
> 
> Applying your 2.2.15pre VM-global-patch apparently solved the
> problem (at least, it never occured again since then).
> 
> However, it's back in 2.4.5! When randomly accessing files (such as
> doing a "tar cpPvf - /mounted_cloop_iso9660 | dd of=/dev/null"), IO on
> all mounted block devices sometimes suddenly stops after a while.
> This does not seem to be related to the amount of memory/cache/swap,
> read errors on CD-Rom or other obvious things. I suspected a deadlock on
> concurrently running device IO queues.
> 
> I looked into your patch and found that some of your changes have been
> incorporated into 2.4.5, some others not (expecially your improved
> locking mechanism for cached buffers). I wondered if there is the same kind
> of patch available for 2.4.5 that you made for 2.2.18, or if it is really a
> healthy idea for me to successively trying to apply the same kind of
> changes you did for 2.2.18, to 2.4.5.
> 
> So, maybe you already worked on this and have a hint for me?
> 
> Since I'm running out of time (the press date for our CD is next week),
> I will try to apply some of your blkdev patches from
> ftp://ftp.de.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/ and
> see what happens.
> 
> Of course I would appreciate any kind of hint or insight from you
> regarding this deadlock condition, which does not directly seem to be
> related to the loopback/cloop device but rather to the way the kernel
> handles buffered blocks in general, or maybe it's even a VM issue
> (though I would expect kernel panics if this was the case).

Some of the fixes in vm-global shouldn't be necessary anymore in 2.4.5
(like the fs_down/fs_up stuff), but some it is (like the per-process
page reservation during memory balancing however in 2.4 there are more
serious VM problems at the moment than such one).  btw, can you try to
reproduce with 2.4.5aa3 too just in case, I just have some critical vm
fix in my tree compared in 2.4.5 that you will find shortly in
2.4.6pre2.

Other idea: are you changing the end_io callback? I just got another
email in the last days from somebody at ibm with troubles with the
end_io callback replacement and the end_buffer_io_async logic but I
didn't had time to process their email yet.

Andrea


Reply to: