[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#701936: btrfs can't fsck /run/rootdev on boot



reassign 701936 initscripts
severity 701936 serious
thanks

On Fri, Mar 01, 2013 at 07:50:37AM +0100, Daniel Baumann wrote:
> retitle 701936 btrfs can't fsck /run/rootdev on boot with sysvinit
> severity 701936 important
> clone 701936 -1
> reassign -1 sysvinit
> thanks
> 
> works with systemd, it's sysvinit specific.

This is largely init-system agnostic.  There's only a single issue
with initscripts, and even that's arguably a Btrfs issue.  Please do
not use "works with systemd" as an excuse for gratuitous brokenness.
It's broken everywhere.

There are a number of serious problems here.  I'll go through each
in turn.


1) checkroot.sh creates an invalid /run/rootdev

Btrfs, unlike every other filesystem I'm aware of, reports an
invalid device with stat(2).

    % findmnt / 
    TARGET SOURCE    FSTYPE OPTIONS
    /      /dev/sda4 btrfs  rw,nodev,relatime,ssd,discard,space_cache

    % stat / | grep ^Device
    Device: 10h/16d	Inode: 256         Links: 1

    % mountpoint -qx /dev/sda4
    8:4

    % mountpoint -d /         
    0:16

The mountpoint discrepancy triggers the creation of /run/rootdev, but
since the block device is actually *invalid*, this therefore causes
fsck to fail.  This is the root cause of this bug.  checkroot.sh makes
the assumption that the filesystem device is valid; this is not the
case with Btrfs.  Up until now, this assumption has been valid in all
the circumstances triggering this codepath.

We can teach checkroot to prefer the mount device, but this is not
always a good choice (there is a reason why we have this particular
fallback).  I have tried this, and it does allow fsck ro run.  But for
non-Btrfs rootfses, this is the wrong thing to do.


2) fsck.btrfs fails to fsck a mounted filesystem

fsck.btrfs won't check a mounted filesystem, even if mounted
read-only.  We need to be able to do this, since we are running
fsck from the rootfs.  We do this for all other filesystem types.


3) fsck.btrfs does not support the standard fsck options

fsck.btrfs does not include support for most of the options in
fsck(8), even to ignore them.  Since we make use of these options,
fsck.btrfs breaks.


4) fsck.btrfs error codes

I haven't tested this due to point (2) above, but if you look at
checkroot.sh, you'll notice that the exit codes are quite
important for doing the right thing for fsck failures and in
some cases success.  fsck.btrfs *must* use the same codes.


So, to summarise the current situation:

• systems with a btrfs root filesystem are currently *unbootable*
  without using "fastboot" to skip fsck
• even if the checkroot script is "fixed", fsck.btrfs remains
  broken and all the unbootable systems remain unbootable
• at this late stage in the freeze, btrfs-tools should never
  have been uploaded to unstable
  · it's fundamentally broken
  · it's broken countless systems (including my own)
  · it's obviously not been tested properly; these tools are
    fundamental to basic system functioning, and I expect
    better quality work from a Debian developer
  · this is obviously unsuitable for wheezy
• I'm loath to make any changes to initscripts to work around this
  breakage, not only because it won't fix the root cause of the
  problems, but because to change the core scripts at this point
  would be to compromise the months of testing they have had, and
  that's simply unacceptable

I would recommend that this be immediately reverted in unstable.
If you want to put it into experimental, feel free, but please
add a big disclaimer to avoid further breakage.  Given the
brokenness, probably best not uploaded at all until it will
not break booting.

If Btrfs needs special handling in initscripts due to unique
requirements, then I'm happy to look at adding such support.
However... you should have communicated this to me /before/
uploading this and breaking lots of systems, so that the
support would have been in place ahead of time.  And you should
have also done some testing to avoid breaking so many people's
computers; this is just not acceptable.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux    http://people.debian.org/~rleigh/
 `. `'   schroot and sbuild  http://alioth.debian.org/projects/buildd-tools
   `-    GPG Public Key      F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800


Reply to: