[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#977647: 5.10.1 Debian kernel does not boot on amd64 with btrfs rootfs



Hi,

Ryutaroh Matsumoto <ryutaroh@ict.e.titech.ac.jp> writes:

> Hi Nicholas,
> Thank you very much for your attention!
>

You're welcome :-)

[snip]
> Boot failures themselves are unreliable...
>

This is the key to the most important issue.  To solve this bug, we will
need to figure out the steps to reproduce it, and it's not clear what
needs to be fixed (or how to fix it!) if no one can reproduce it.

>> If that's not possible,
>> see if you can get a copy from /var/log after rebooting with a working
>> kernel or when using a network disk.
>
> /var/log/kern.log does not have information of 5.10.1...
>
>> If that's not possible, and you
>> have good reflexes, you could try ctr+s or scroll-lock to pause the
>> kernel output at just the right moment,
>
> I found that once in ten times, booting succeeds and I got dmesg and /proc/mounts
> of 5.10.1 as attached...
>

Thank you.  The backtrace is from a crashing tpm driver, and not btrfs.
If you want to eliminate this as a possible factor, the tpm, tpm_tis,
and tpm_tis_core drivers can be blacklisted.

It's encouraging to hear that it fails 10% of the time, because that
means you'll be able to reproduce it without much trouble ;-) Please
finish reading this email, then enable network logging using the
document linked to in my previous email, because it sounds like your
system is probably hard locking before the logs are written to disk,
which means sending them to another computer is the most reliable way to
capture them.  For your convenience, here is the link again:
  https://wiki.debian.org/Rsyslog

>> Finally, what btrfs features (profiles, compression, layers of storage,
>> etc) are being used?
>
> Please have a look at attached file. Less usual one is compress-force=lzo.
> If I remember it correctly, all files are lzo-compressed.
>

P.S. I believe that btrfs compression is not yet ready for general use.
There's still at least one major bug per year that only occurs when it
is enabled.  If I remember correctly, Zygo on the linux-btrfs mailing
list is tracking its state, and he periodically posts "year in review"
and "current outstanding issues" reports.

At this point my two hypotheses are:

1. It's hardware specific, and the TPM crash is more significant than it
appears to be.
2. It may be that your btrfs volume has errors which 5.10 detects and
halts for, but which 5.9 is not aware of.  If you'd like to explore this
possibility, run "btrfs check" against the unmounted volume.  If it
finds errors *DO NOT* run "btrfs check --repair"; instead, send a copy
of the output to linux-btrfs asking for advice about what to do next,
and request to be CCed on replies.
3. Something else.

There's not really any substitute for logs, so I wish you success in
configuring network logging :-)

Regards,
Nicholas

Attachment: signature.asc
Description: PGP signature


Reply to: