[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#604469: marked as done (linux-image-2.6.26-2-openvz-amd64: openvz - deadlock during RAID rebuild with container backing store on LVM+snapshot)



Your message dated Mon, 20 Jun 2011 13:47:31 +0100
with message-id <1308574051.3093.3.camel@localhost>
and subject line Re: Bug#604469: Update check
has caused the Debian Bug report #604469,
regarding linux-image-2.6.26-2-openvz-amd64: openvz - deadlock during RAID rebuild with container backing store on LVM+snapshot
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
604469: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=604469
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: linux-image-2.6.26-2-openvz-amd64
Version: 2.6.26-25lenny1
Severity: normal

On Lenny, I have observed the following behaviour:

An I/O deadlock occurs under the following conditions:

. OpenVZ container data stored on an LVM for which the PV is an md RAID1
. RAID1 md undergoing a rebuild or check
. An LVM snapshot is active for the logical volume (in order to take a
backup)
. I/O occurs within the container

The RAID resync then stops, as does all other I/O to the filesystem
which is mounted upon the LVM.

Adding various debug to the md raid1 driver shows that when the system
gets to this state, there are a number of bio requests which are still
pending (or at least their callback never gets executed).

http://marc.info/?t=128473541100001&r=1&w=2

Adding the debug printks and atomic counters appears to make the deadlock
occur more readily (within seconds rather than minutes of starting the
openvz container).

My guess was that this is some sort of a priority inversion deadlock, I/O
in the container is triggering I/O outside of the container (via LVM
snapshot) which must complete first (because of the OpenVZ scheduling
rules or otherwise), and possibly the md barrier code is enforcing the
opposite ordering.

.. but when I tried:

echo 0 > /sys/block/sda/queue/iosched/virt_mode
echo 0 > /sys/block/sdb/queue/iosched/virt_mode

prior to starting the container, didn't seem to change things.  So maybe
that's not the problem.

Simply using the container private directory as a chroot, and placing
reasonably heavy I/O load does not seem to cause the same deadlock to
occur.

I haven't seen the deadlock occur on 2.6.32, but haven't tried to insert
the same debugging statements.




-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-openvz-amd64 (SMP w/8 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/dash

Versions of packages linux-image-2.6.26-2-openvz-amd64 depends on:
ii  debconf [debconf-2.0]         1.5.36     Debian configuration management sy
ii  initramfs-tools [linux-initra 0.98.5     tools for generating an initramfs
ii  module-init-tools             3.12-1     tools for managing Linux kernel mo
ii  vzctl                         3.0.24-10  server virtualization solution - c

linux-image-2.6.26-2-openvz-amd64 recommends no packages.

Versions of packages linux-image-2.6.26-2-openvz-amd64 suggests:
pn  grub | lilo                   <none>     (no description available)
pn  linux-doc-2.6.26              <none>     (no description available)



--- End Message ---
--- Begin Message ---
Version: 2.6.32-34squeeze1

On Mon, 2011-06-20 at 10:51 +0100, Tim Small wrote:
> On 20/06/11 06:19, Ola Lundqvist wrote:
> > I would like you to check if the issue you reported in 604469
> > is solved in the squeeze release.
> >   
> 
> Well, I can't say for certain, but I couldn't reproduce the issue using
> the squeeze kernel.

Closing, then.

Ben.

-- 
Ben Hutchings
When in doubt, use brute force. - Ken Thompson

Attachment: signature.asc
Description: This is a digitally signed message part


--- End Message ---

Reply to: