[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#419209: lvm2: Hangs during snapshot creation



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Am Do den  5. Mär 2009 um  9:48 schrieb Bastian Blank:
> severity 419209 important

It is critical as it breaks the whole system! I do not want to start a
severity war with you but please do not set the severity to wrong level.
It do not fix the bug just lowering the severity.

> On Wed, Mar 04, 2009 at 10:44:28AM +0100, Klaus Ethgen wrote:
> > first of all, I raised the severity of the bug to critical as it makes
> > the whole system break.
> 
> lvm2 manages blockdevices via the device-mapper framework, so it manages
> the system. It is the purpose of this tool and it will not hold you from
> doing stupid things.

Using snapshoots is no stupid thinks in my opinion. If you think that I
think the only stupid are your opinion.

So, enough bashing. Please stay on a objective level as this is a
critical bug in a core component which is proven to exists for many
people! Please help fixing it and do not play just with the severity!

> And for now I consider taking snapshots of / or /var as stupid, because
> it is impossible to recover if something goes wrong.

The braking happens with several lvs I had this some time with /usr some
time with /home and only one time with /var.

However snapshoting /var wasn't a problem in the past and is the
desired use of them too (also if you do not think so, but this is your
opinion!).

And I do not use snapshots for / but for another reason. I do not have
/ in lvm at all.

Ah, yes, and why do you think should /var be broken when using
snapshots? Just the snapshot might be broken. But as it is necessary to
reboot the system anyway this can be fixed easily after the boot the
same way as fixing a broken snapshot of /usr or /home. And with your
opinion it would be more stupid to use snapshots with /home cause the
most vitally data is in /home, not in /var. So, following your opinion
using snapshots at all is stupid. Please consider not to name other
people stupid.

> And without a working filesystem on this locations, the system will
> just block. It may work, but it also may break horrible as the kernel
> interface does not allow to do this change atomic.

That's wrong assuming. It was working well with lvm1 and (as I know now)
also with lvm2 up to the version in etch.

> >                         Also I add debian-devel to Cc as the bug is very
> > problematic and I wonder how lvm2 was able to get into lenny with that
> > big problem!
> 
> We have many software who only works for most but not for all people.

So the software is not buggy if it just works for the most people?

> > Also I am willing to help solving the bug. My next step will be to
> > import the whole version history to git and try to besect the problem.
> 
> Why do you think this would be a problem of the userspace part?

Cause just downgrading the userspace tools to 2.02.06-4etch1 fix the
bug!

I also first think of a kernel bug and, as I wrote in my mail, I did
update the kernel to the latest release to see if the bug still
persists before searching for other reasons.

> > So this bug is a complete show stopper for lenny!!!!
> 
> If you want to help you can provide the following information when it
> goes wrong:
> - "uname -a"

I just did that, the last available kernel release:
Linux ikki 2.6.28.7 #1 Sun Mar 1 13:03:56 CET 2009 i686 GNU/Linux

> - "dmesg"

Unhelpful as the system is booted new now.

> - "dmsetup table"
 ~> dmsetup table
 sysvg-lv_usr: 0 14024704 linear 9:0 97714560
 sysvg-lv_usr: 14024704 2752512 linear 9:0 65920
 sysvg-lv_var: 0 4194304 linear 9:0 2818432
 sysvg-lv_mirror: 0 117440512 linear 9:0 126615936
 sysvg-lv_local: 0 16777216 linear 9:0 7012736
 sysvg-lv_home: 0 73924608 linear 9:0 23789952
 sysvg-lv_home: 73924608 14155776 linear 9:0 111739264
 sysvg-lv_misc: 0 167772160 linear 9:0 244056448
 sysvg-lv_sec: 0 585826304 linear 9:2 65920
 sysvg-lv_sec: 585826304 22347776 linear 9:0 411828608
 sysvg-lv_hathi: 0 720896 linear 9:0 125895040


> - "cat /proc/mounts"
 ~> cat /proc/mounts 
 rootfs / rootfs rw 0 0
 /dev/root / xfs rw,noatime,nodiratime,noquota 0 0
 tmpfs /lib/init/rw tmpfs rw,nosuid,mode=755 0 0
 proc /proc proc rw,nosuid,nodev,noexec 0 0
 sysfs /sys sysfs rw,nosuid,nodev,noexec 0 0
 tmpfs /dev tmpfs rw,size=10240k,mode=755 0 0
 tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
 devpts /dev/pts devpts rw,nosuid,noexec,gid=5,mode=620 0 0
 fusectl /sys/fs/fuse/connections fusectl rw 0 0
 usbfs /proc/bus/usb usbfs rw 0 0
 tmpfs /tmp tmpfs rw,nosuid,nodev 0 0
 /dev/mapper/sysvg-lv_usr /usr xfs rw,noatime,nodiratime,nobarrier,noquota 0 0
 /dev/mapper/sysvg-lv_var /var reiserfs rw,noatime,nodiratime 0 0
 /dev/mapper/sysvg-lv_local /usr/local xfs rw,noatime,nodiratime,nobarrier,noquota 0 0
 /dev/mapper/sysvg-lv_home /home reiserfs rw,nosuid,nodev,noatime,nodiratime 0 0
 /dev/mapper/sysvg-lv_misc /misc xfs rw,nosuid,noatime,nodiratime,nobarrier,noquota 0 0
 /dev/mapper/sysvg-lv_sec /misc/.sec xfs rw,nosuid,nodev,noatime,nodiratime,nobarrier,noquota 0 0
 /dev/mapper/sysvg-lv_mirror /mirror xfs rw,nosuid,nodev,noatime,nodiratime,nobarrier,noquota 0 0
 tmpfs /media tmpfs rw,nosuid,nodev,noexec,mode=755 0 0
 /dev/mapper/sysvg-lv_hathi /hathi ext2 ro,errors=continue 0 0
 rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
 nfsd /proc/fs/nfsd nfsd rw 0 0
 binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,nosuid,nodev,noexec 0 0

The rest is private information (fuse filesytems).

> - debug log of the "lvcreate -s" call, using -vvvv

See my other mail.

> - your snapshot creation script

I will add some documentation and will put it online. For now just
believe me that it is going over all devices and try to make a snapshoot
of every one.

> > Am Sa den 14. Apr 2007 um 12:53 schrieb Jean-Luc Coulon (f5ibh):
> > > New lvm2 version (2.02.24) hangs during snapshot creation.
> > > The lvmvreate process is not killeable at this point and the system need to be
> > > reboted.
> > That is the correct description.
> 
> There was a bug in older kernels which blocked on devmapper table
> reload, however I've only seen this with the mirror target during a
> pvmove call.

As you can see in my first mail and in this one, I use the most recent
kernel.

Also the bug IS in userspace as using the etch version fix the bug.

> >                                  But more over the system will be
> > unbootable at all! I have to run /etc/init.d/reboot stop by hand to hard
> > reboot the system. A normal shutdown will end in a hanging system with
> > no remote access at all. The only solution at that point is to
> > powercycle the machine which is very problematic with remote system.
> 
> This is the normal behaviour if you lock out either a filesystem or have
> some parts of the kernel disfunctional after oopses.

Yes, I know. But I also tell this a complete system breakage. (To show
you which is the right severity of this bug.)

> > Am Fr den  2. Nov 2007 um 16:21 schrieb Stefan Pfetzing:
> > > did you try to snapshot your /var? Because to me it seemms like the  
> > > current lvm2 configurations tries to use /var/lock/lvm for its locking 
> > > files, and this leads to a deadlock.
> > This is not really a problem as it is irrelevant if that file is locked
> > or not in the sapshoot.
> 
> It is. However I'm currently not sure if it ever tries to write/read
> this files while it have an operation going.

Sorry, but is is not and it was never a problem. But it doesn't matter
ever as the most break I had was with /usr which will be snapshooted
first. But sometimes it works with /usr and then /var or /home or /misc
will trigger the bug.

> > And I wonder why this should be a problem at all as the lvm1 was working
> > pretty stable for years now.
> 
> lvm2 and lvm1 does not have many in common.

I know. Well, no, they have the same structure. Just the meta data and
the way how it work is completely different.

> > Am So den 30. Mär 2008 um 10:52 schrieb Bastian Blank:
> > > # Automatically generated email from bts, devscripts version 2.9.26
> > > severity 419209 important
> > The severity of this bug is absolute critical and not just important!
> 
> This is up to the maintainer. I use snapshots often and have not seen
> such problems recently.

Ah, that's a very loose interpretation of the debian policy. From
reportbug:
 critical: makes unrelated software on the system (or the whole system)
 break, or causes serious data loss, or introduces a security hole on
 systems where you install the package.

And that is the case here. The whole system may break using the lenny
version of lvm2. There is no count for how many people that must apply.

I will just set the severity once again to critical as I think I did
make is clear why. If you want to start a severity war just do it. I
will never ever change the severity of this bug again. But this would be
contra productive for the bug solution.

And just to please again. Please stay on a objective level. The mail of
Steve Langasek was much more of help than yours which includes several
insults and not proved meanings.

Regards
   Klaus
- -- 
Klaus Ethgen                            http://www.ethgen.de/
pub  2048R/D1A4EDE5 2000-02-26 Klaus Ethgen <Klaus@Ethgen.de>
Fingerprint: D7 67 71 C4 99 A6 D4 FE  EA 40 30 57 3C 88 26 2B
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iQEVAwUBSa+fNJ+OKpjRpO3lAQp8Lgf/RgLl3yfUFU4duLhWk5P1ilMVAZsrXEZB
yqFHZrNRoFyY6oTq8rdrk4zee+1cFHruuJKoWdkvBBGBcWVt2ysCKAU1kgsd2e/n
veI+li+xv2EEOpinpF04IwPuPuDfNR6PJg/leosgBprN1akMZnBKnie3R7+KKj6n
hz18/wYZc9iYeoGqKEx6qHhglmwe37Mturk/8TPB4G8lAFZaAiatJvHpwauv0vpz
bvHkXWTGge9qtBo64GacKKAoIBm9M+5T7N905k5BuhiFWTXhAWyGswa3egh1Jtct
LrT/oF8NQZlL6c7GziXarCv2mGJUIyS1j8cqBSolwOEnsFM2eLKLeA==
=Vf7M
-----END PGP SIGNATURE-----


Reply to: