Re: btrfs subvolume naming scheme

To: Philipp Kern <pkern@debian.org>, Geert Stappers <stappers@stappers.nl>
Cc: debian-boot@lists.debian.org
Subject: Re: btrfs subvolume naming scheme
From: Nicholas D Steeves <nsteeves@gmail.com>
Date: Wed, 1 Jun 2016 08:23:58 -0400
Message-id: <[🔎] CAD=QJKiRMNqFNoHPMa=2gDVoZmCVTNKQ05FqCeVZkYMfsGWEVQ@mail.gmail.com>
In-reply-to: <29c59185d4dbeccd858c9dec7e956f7e@hub.kern.lc>
References: <20160423113824.GG27439@gpm.stappers.nl> <CAD=QJKhnNfQTwDJ3GQWiZGqkiRSLxRSFzTGJ787NtJgRt4XXAA@mail.gmail.com> <29c59185d4dbeccd858c9dec7e956f7e@hub.kern.lc>
Thank you for the replies, and I'm so sorry this email is long.  I had
also hoped that by taking time to think through the issues I would
place less of a "this email is so long and taking so much of my time!"
burden on everyone reading this thread.

On 27 April 2016 at 07:43, Philipp Kern <pkern@debian.org> wrote:
> On 2016-04-23 23:51, Nicholas D Steeves wrote:
>>
>> Ubuntu avoids using the default subvolume (subvol ID 5).  For the
>> rootfs their installer creates a subvol called @, for /home it creates
>> @home, etc.  In fstab the device is specified and subvol=@ is added to
>> the mount option to specify which subvolume gets mounted.  When the
>> volume is mounted without a subvol option, it mounts the whole tree.
>> The tree would be /btrfs/@ and /btrfs/@home if mounted from a rescue
>> disk.
>
>
> I think avoiding the default subvolume is the sensible approach. It should
> be possible to reuse the volume by multiple root filesystems if needed.
> (Just like this is possible with LVM today.)

And in the future, multiple boot environment support!  I think
openSUSE might already have integrated it, but this would be an
early-to-mid fall project for me.  At a bare minimum the creation of a
boot environment should someday occur before running a dist-upgrade.
I've read BEs are quite popular with Archlinux users, whose systems
break more often than most ;-)

> I'd personally prefer if we would only mount the btrfs filesystem once, but
> I don't know what the best guideline here is. If we mount it multiple times
> at different subvolumes, the output of mount is pretty confusing to the
> user. The user would of course still be free to mount additional arbitrary
> subvolumes later and end up with this state.

Hmm, ok, I guess the default should be like other distributions (1
subvolume -> 1 mountpoint), and then address the two different
topologies of snapshots with pros and cons in the wiki, and let the
sysadmin configure it how he/she likes.  (TODO) I definitely need to
more explicitly, address the dangers of going snapshot crazy, or using
a loose and easy snapper config, because performance crashes somewhere
between at 250 and 300 snapshots per subvolume, and also sometimes
wedges the volume into an unmountable state.

> I think /var/log would be very sensible as a separate subvolume by default.
> Usually if you want to snapshot your rootfs, you really don't want log files
> to take part in the snapshotting. I suppose the same argument can be made
> about /home and a non-tmpfs /tmp. There are also people who want to push for
> all OS content to be located in /usr, but we still have a lot of content in
> /var so that doesn't seem feasible in the near time.

Will omitting /var/log from backups cause any systemd or journald
voodoo to cause errors on restore?  If /usr and /var should be on the
same subvolume, and /var and /etc/ need to be, then rootfs (for future
enabling of boot environments) should encompass everything needed to
boot.  User data should probably be separate, because it would be
invisible/appear to be "lost" when reverting to an older BE.
Likewise, if BEs are supported, the wiki (note to myself) would need
to recommend creating a subvolume for /var/www, location of a major
database, and/or any exported samba or nfs mounts.  The primary
argument I've read against doing this at installation (like openSUSE
does) is that it increases the chances of something going wrong or
performing poorly in the same way as "too many snapshots."

> Kind regards and thanks for spawning these discussions
> Philipp Kern

You're welcome.  Sorry for the delay in following up.

On 24 April 2016 at 02:30, Geert Stappers <stappers@stappers.nl> wrote:
> On Sat, Apr 23, 2016 at 05:51:25PM -0400, Nicholas D Steeves wrote:
>> On 23 April 2016 at 07:38, New Thread old subject joining team
>> <stappers@stappers.nl> wrote:
>> > On Sat, Apr 23, 2016 at 01:28:43PM +0200, Philipp Kern wrote:
>> >> On Fri, Apr 22, 2016 at 08:30:35PM -0400, Nicholas D Steeves wrote:
>> >>
>> >> > I'd also like to discuss whether the default subvolume naming scheme
>> >> > should follow Ubuntu, Fedora, OpenSUSE, or something else.
>> >>
>> >> What scheme are they using?
>> >
>> > Or a proposal for default subvolume naming scheme?
>> >
>>
>> Ubuntu avoids using the default subvolume (subvol ID 5).  For the
>> rootfs their installer creates a subvol called @, for /home it creates
>> @home, etc.  In fstab the device is specified and subvol=@ is added to
>> the mount option to specify which subvolume gets mounted.  When the
>> volume is mounted without a subvol option, it mounts the whole tree.
>> The tree would be /btrfs/@ and /btrfs/@home if mounted from a rescue
>> disk.
>>
>> I think the symbol is visually striking, but it can be cumbersome
>> because @+<tab> sometimes autocompletes as ipv6 addresses for root.
>> I'm also not sure if @ ever needs to be escaped \@.
>
> FWIW I have no memory of @ needed to be escape.
> A websearch on "linux shell when needs @ be escaped \@"
> ( https://www.google.nl/search?q=linux+shell++when+does+%40+needed+to+be+escaped+\%40 )
> didn't show me that @ is special for bash.
>
> Doing @+<tab> for autocompletion is done interactive by a thinking user,
> so I see no danger, I trust the thinking user.
>
>> Oh!  I just booted OpenSUSE Leap (their LTS), and it looks like
>> they've now adopted the Ubuntu convention of using @.  OpenSUSE has
>> also, to my knowledge, always avoided using the default subvolume.
>> Furthermore, OpenSUSE creates subvolumes for just about everything
>> @opt, @srv, @tmp, @usr/local, @var/crash, etc.
>>
>> Fedora 23 Workstation: When btrfs-style partitioning is selected,
>> their installer creates two subvolumes, home, and root.  When the
>> volume is mounted to /btrfs without a subvol= option, the tree would
>> be /btrfs/home and /btrfs/root.  Like Ubuntu and OpenSUSE default
>> subvolume is also not used.  From what I gather this is a necessary
>> configuration to support btrfs send and receive.  Eg: Bug #764056 is a
>> result of our current policy.
>>
>> Unlike LVM or disk partitions, all free space is shared between
>> subvolumes.  In the future it will be possible to use qgroups (quota
>> groups) to prevent /var or /home from using up all available free
>> space in rootfs, but at this time I don't think we should support it
>> in the installer, because of the volume of associated bugs and code
>> churn on the linux-btrfs mailing list.  Also, in the future it will be
>> possible to mount subvolumes with different options, but at this time
>> the first subvolume mounted sets the mount options for all members of
>> the volume--I'm not sure how to address this the D-I.

Ok, fstab mount options for will be KISS.

>> In consultation with
>> https://btrfs.wiki.kernel.org/index.php/SysadminGuide#When_To_Make_Subvolumes
>> , a subvolume for rootfs and for /home seems sane, and sysadmins can
>> be instructed to consider making a subvolume for /var/www in
>> documentation.
>>
>> This brings us to a concern I have for documentation.  How should
>> /var/www appear when it's mounted using a rescue disk?  Should it be
>> /btrfs/var_www?  If it was /btrfs/var/www, then the two possibilities
>> are:
>>
>> a)    /btrfs/var is its own subvolume
>> and /btrfs/var/www is a child subvolume of /btrfs/var
>>
>> note: strictly speaking, all subvolumes what seems to be the root
>> volume are actually children of the default subvolume...the semantics
>> get tricky very quickly!
>>
>> or
>>
>> b)    /btrfs/var is a normal directory
>> and in the case of /btrfs/var/www, www is actually a child of /btrfs.
>>
>> Finally, because subvolumes are partitions in POSIX namespace, it's
>> safe to mount a subvolume to two locations, and also to have a
>> /btrfs-admin directory where the whole volume is mounted, at the same
>> time as individual subvolumes are mounted.  eg: you have your rootfs
>> mounted at /, and also at /btrfs-admin/rootfs or /btrfs-admin/@.
>>
>> The primary reason to do this is because most of the btrfs tools
>> operate on mountpoints rather than on devices.  It also allows
>> centralisation of snapshots.  eg: /btrfs-admin/snapshots is a normal
>> directory that holds snapshots of /btrfs-admin/rootfs,
>> /btrfs-admin/home, etc.
>>
>> Résumé: Do we follow Ubuntu and OpenSUSE with the @ convention and
>> work through the issues in bash-completion, or we follow Fedora's
>> plain text/alphanumeric convention, or do we do our own thing?
>
> The one which follows BTRFS upstream philosophy.
> However, I don't know if such guide exists.

>From what I've gathered reading and participating in
linux-btrfs@vger.kernel.org, there isn't any kind of codified
manifesto.  That said, at this point in time maximum flexibility is
highly valued, but not if the design looks like it's going to break.
I think it fits nicely with Debian values.  The sysadmin guide I
linked to is a good example of this, because it reveals how flexible
the FS/volume management is, and mentions possible pitfalls, eg:
"Beware: Care must be taken..."

The convention proposed in the upstream btrfs sysadmin guide is
identical to the way Fedora names its subvolumes, and I believe it was
written by a Fedora developer.  I think the primary rational to
diverge from it is to signify that what seem to be directories are not
actually directories, but subvolumes.  As I mentioned before the
flexibility this FS supports is astounding, and I fear could be a
nightmare to support without near self-evident default conventions.  I
guess I'll just pick something, and it someone doesn't like it then it
can be changed?  I was hoping any strong opinions would come out in
this discussion! :-)

>> Secondly, Do we want to limit the difficulty of supporting complicated
>> configurations by establishing simple conventions and recommendation
>> early on?  eg: all subvolumes created in the installation are peers,
>> and a subvolume that will be mounted at /var/www is named var_www.
>> A default delimiter convention would also need to be chosen.
>
> Install a starting point.
> More complicated configuration can be done on the installed system.
> KISS

In the time since receiving replies to this thread, I encountered a
user who installed Debian using the defaults, but using btrfs.  He
later added disks and rebalanced the volume as btrfs raid6 profile.
Grub does not support booting from this configuration, because he did
not have a separate /boot.  D-I does not support btrfs raid5/6
profile, and *should* not support raid5/6 until some time
post-linux-4.10, unless a large company funds development to
accelerate it's development...upstream's recommendation is don't use
it except for testing, don't trust it, it needs work (paraphrased).

[1] Fedora addresses this by putting /boot on a separate ext4
partition.  This means a backup made with btrfs-snapshot + btrfs-send
will not be bootable, but it can be worked around by using dedicated
backup software that hooks into btrfs-snapshot like hooking into LVM.

I guess the question is: Do we ship a default configuration that will
let a user who hasn't read the wiki (TODO: put this issue in the wiki)
make their system unbootable, while innocently experimenting in a way
that can't be worked around?  This isn't as self-evident as receiving
a recommendation on IRC to defrag your hard drive with a "sudo rm -rf
/" ;-)

Work arounds being: 1) shrink EFI partition to make room for /boot  2)
If the installer puts swap after EFI partition and before rootfs, and
by default makes it big enough to be able to hibernate, plus some
extra space that can be stolen to create /boot, then the user can use
rescue media to shrink the swap and make room for /boot.  Or just do
[1] and skirt this issue, even though it's not necessary, and maybe
less than ideal for single disk, RAID1, and also RAID10 (possibly)
configurations.

On the topic of default partitioning, would it be possible to always
leave something like 64MiB unallocated at the end of the disk?  The
use case being: user installs a pair of identical drives in RAID1.
One drive fails.  The replacement drive is slightly smaller than the
original drive.  Oops...  Or does this fall under the category of "the
sysadmin should have known better?"  The reason I mention it, is that
with btrfs the sysadmin can install to a single drive, add a second
drive, convert to RAID1 on a live system.  When one drive fails some
time later...  Oops.

Finally, I have some questions about partman-zfs, which I looked at in
preparation for overhauling partman-btrfs.  Are the LVM-style naming
and semantics necessary, or are they local to the the partman-fs_name?
 One of the reasons I've taken so long to reply is that I had planned
to do spent time learning how everything fits together before asking
this question...then...life happens...you know?

Best regards,
Nicholas
Reply to:
Prev by Date: Re: touchpad
Next by Date: Bug#826018: installation-reports: Stretch Alpha 6 successful installation, but no touchpad during install
Previous by thread: Re: touchpad
Next by thread: Bug#826018: installation-reports: Stretch Alpha 6 successful installation, but no touchpad during install
Index(es):
- Date
- Thread