[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: DEP 17: Improve support for directory aliasing in dpkg



Hello,

I'd like to offer some food for thoughts on this issue.

>From what I have understood, Guillem would rather avoid committing
to a new public interface for this specific use-case, i.e. the
fact that the DEP is suggesting "dpkg --add-alias" is problematic
because that feature will be useless when we will have moved
to .deb shipping files in /usr only.

However the problem of file loss through aliased directories is a broader
problem, it is not specific to this transition. It's quite possible that a
package is shipping a symlink pointing to a directory and to have other
packages installing files through that symlink (and then move those files
between binary packages and between their two possible locations).

Let's try to tackle that problem in a generic way without requiring
any external information... it ought to be doable. You did consider
it partly already:

On Mon, 03 Apr 2023, Helmut Grohne wrote:
> Naive solution
> ==============
> 
> In theory, `dpkg` could resolve this automatically.  For every file it
> touches, it could canonicalize the location using the actual filesystem
> and check whether any other installed file has the same canonicalized
> location.  Unfortunately, `dpkg` cannot know which filenames can
> collide, so it would check every filename in its database.  For
> canonicalization, it would `stat()` every component of every filename.
> This easily amounts to a million or more `stat()` calls on larger
> installations.  Caching could reduce the impact somewhat, but since
> Debian introduces aliases during maintainer scripts, it would have to
> invalidate the cache after maintainer scripts have been run.  The
> resulting performance would be unacceptable.

Here you are considering all files, but for the purpose of our issue,
we can restrict ourselves to the directories known by dpkg. We really
only care about directories that have been turned into symlinks (or
packaged symlinks that are pointing to directories). That's a a much lower
number of paths that we would have to check.

You are speaking of having some sort of cache and I certainly agree
that it would make sense to have such a cache.

We could decide that /var/lib/dpkg/aliases is that cache, it would
be the result of a scan of all directories known by dpkg (i.e. all
paths known by dpkg where files are installed through that path) and
it would list the target directory in case that path is a symlink.
The absence of a directory in that file would mean that, according to
dpkg, the directory ought to be a real directory.

Thus this time-consuming operation would be done once, the first
time that the updated dpkg starts and when /var/lib/dpkg/aliases
does not yet exist.

That cache file would be kept up-to-date by the various dpkg invocations:
- when you install a new .deb containing a symlink pointing to a
  directory, that new "aliased path" is added to this file
- when dpkg removes a symlink that is listed in the aliases file, we drop
  it too

We don't add any new public interface to dpkg, but we also have the
possibility to remove to /var/lib/dpkg/aliases to force an new scan
(some sort of "dpkg --refresh-aliases" without an official name).

It might still be cleaner to have that "dpkg --refresh-aliases" command
so that we can invoke it for example in "dpkg-maintscript-helper
symlink_to_dir/dir_to_symlink" when we are voluntarily turning a directory
into a symlink (or vice-versa).

In any case, now that you have a database of aliases, you can do the other
modifications to detect conflicting files and avoid file losses.

How does that sound?

> Implement aliasing after metadata tracking
> ------------------------------------------
> 
> The [metadata
> tracking](https://wiki.debian.org/Teams/Dpkg/Spec/MetadataTracking)
> feature enhances `dpkg` with knowledge about filesystem metadata for
> installed files.  This includes knowledge of symbolic links, which would
> help with tracking aliasing.  Unfortunately, progress on this is fairly
> slow and we think that aliasing support is more urgent.

The proposal I made above is not a real database in the sense that we
don't record what was shipped by the .deb when we installed the files...
it's rather the opposite, it analyzes the system to detect possible
conflicts with dpkg's view of the system.

It can be seen as complimentary to it. In any case, I don't see how
implementing metadata tracking would help to solve the problem that we
have today. dpkg would know that all .deb have /bin as a directory and
not as a symlink, and it would be able to conclude that the directory
has been replaced by a symlink by something external, but that's it.

It should still accept that replacement and do its best to work with it.

Cheers,
-- 
  ⢀⣴⠾⠻⢶⣦⠀   Raphaël Hertzog <hertzog@debian.org>
  ⣾⠁⢠⠒⠀⣿⡁
  ⢿⡄⠘⠷⠚⠋    The Debian Handbook: https://debian-handbook.info/get/
  ⠈⠳⣄⠀⠀⠀⠀   Debian Long Term Support: https://deb.li/LTS


Reply to: