[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#778849: Is "wishlist" appropriate for this?



I have yet to investigate intrigeri's suggestions from 2017, however I would suggest that this is something that needs to be upgraded from wishlist in 2022, and here's the reason simply enough:

root@aki:~# nvme smart-log /dev/nvme0
Smart Log for NVME device:nvme0 namespace-id:ffffffff
[..]
unsafe_shutdowns			: 106
[..]
num_err_log_entries			: 284
[..]
root@aki:~# nvme smart-log /dev/nvme1
Smart Log for NVME device:nvme1 namespace-id:ffffffff
[..]
unsafe_shutdowns			: 121
[..]
num_err_log_entries			: 291
[..]

Given that the frequency and number of SMART errors are deemed an indicator of drive health, that's bad. Also, improper shutdown on NVMe devices could be particularly problematic because they have caches and wear leveling and cleanup cycles that could happen any time the drive is "running" until a shutdown command is issued and responded to. There might actually be some risk of data corruption/loss. (I doubt it with commodity consumer SSDs, but Debian isn't just run on those.)

For a few weeks, we tried on #debian to sort out the cause of the above errors. We thought NVMe drive quirk Linux doesn't support? Maybe Linux is issuing the shutdown command and not waiting long enough? There's Google bait suggesting that's a problem, and there's some BS factoids in dpkg I should remove the next time I connect to OFTC describing the "solution" which I've since discovered doesn't work. This was hard to test because obviously no logger is running at this point of the shutdown process.

The root cause of the problem isn't an unknown quirk, it's that I have LVM on LUKS. (See what I did there?) Connected a drive with an unencrypted Debian system on it that mounted my main installation's /boot and even the LUKS/LVM root somewhere and never got a single unsafe shutdown despite multiple reboots/shutdowns. Because that temp install's root was not on LVM on LUKS backing.

Dracut is a suboptimal solution. In part because after three days of trying to get it to boot my system, I've yet to see it do so, and because while there's lots of documentation for it, it's for other distributions, it's wrong, it's obsolete, or it's misleading. Including one rantthrough from 2017 that offers a profanity-laden survey of most of the others and why they don't work for Debian systems or at all.

As far as I can tell you either need to significantly modify grub or switch to systemd-boot or set up Dracut to generate an EFI executable blob using files that aren't available on a Debian system or throw up my hands and go use Fedora until I understand Dracut enough to try and use it on Debian. Or something. Again: What sparse documentation exists is spotty, inconsistent, and at least five years out of date. Dracut is not how Debian does things, just like OpenRC and rEFInd are not how Debian does things. That's all there if you want to set it up, but you're not going to find many Debian resources on using it.

I think unsafe shutdowns of NVMe devices is actually a bug. And I think it could cause data loss or corruption on more advnaced hardware than I'm using. There's a few options for addressing it and most of them become problems beyond initramfs-tools' scope. But this seven year old bug might be the path of least resistance.

Joseph


Reply to: