Re: DEP 17: Improve support for directory aliasing in dpkg
Hi Raphaël,
On Fri, Apr 21, 2023 at 03:03:10PM +0200, Raphael Hertzog wrote:
> Here you are considering all files, but for the purpose of our issue,
> we can restrict ourselves to the directories known by dpkg. We really
> only care about directories that have been turned into symlinks (or
> packaged symlinks that are pointing to directories). That's a a much lower
> number of paths that we would have to check.
Considering just sid amd64 main, I count around 140000 directories,
which clearly is less than millions. A typical installation will only
have a fraction of that, probably less than 50000. I think this is the
number of stat() calls we'd have to do. I timed this on a reasonably
fast system (admittedly using Python but I think the overhead is not
huge) and this can complete in around 0.1s (with a hot vfs cache). So
depending on the cache invalidation strategy this may be viable or not.
This is looking at it from a performance point of view. Guillem also
raised that this is changing the source of truth from the dpkg database
to the actual filesystem, which Guillem considers wrong and I find that
vaguely agreeable.
> We don't add any new public interface to dpkg, but we also have the
> possibility to remove to /var/lib/dpkg/aliases to force an new scan
> (some sort of "dpkg --refresh-aliases" without an official name).
Can I rephrase this as your cache invalidation strategy is that any
external entity (such as a maintainer script) introducing aliases should
explicitly invalidate the cache.
> It might still be cleaner to have that "dpkg --refresh-aliases" command
> so that we can invoke it for example in "dpkg-maintscript-helper
> symlink_to_dir/dir_to_symlink" when we are voluntarily turning a directory
> into a symlink (or vice-versa).
If you put it this way, it is not that different from the
--add-alias/--remove-alias proposal. It is a different interface to
dpkg, but the semantics are roughly the same:
In both cases, something external to dpkg is responsible for performing
the moves and creating the symbolic links followed by informing dpkg
about the alias (explicitly or implicitly via scanning directories).
Would you agree with me that this is a minor adaption of DEP17? In
essence what changes is the way that a user communicates aliases to
dpkg, but the assumption that a user must communicate aliases to dpkg is
not affected. I'd be fine with changing this aspect in principle, but I
still consider this a new public interface to dpkg with much the same
effects to long term maintenance.
> In any case, now that you have a database of aliases, you can do the other
> modifications to detect conflicting files and avoid file losses.
> 
> How does that sound?
It sounds all the same as DEP17 with a different color to me. Hope I got
it right.
What I tried ruling out as naive solution is eliminating the need to
tell dpkg about aliasing changes and then we'd have to incur this 0.1s
delay after every maintainer script invocation, which would amount to 5
minutes of stat()ing on a typical dist-upgrade assuming a hot vfs cache
on a fast x86 CPU.
> The proposal I made above is not a real database in the sense that we
> don't record what was shipped by the .deb when we installed the files...
> it's rather the opposite, it analyzes the system to detect possible
> conflicts with dpkg's view of the system.
I think that Guillem considers this a bad property as he has expressed
in his reply on debian-dpkg, that .debs should be the source of truth.
> It can be seen as complimentary to it. In any case, I don't see how
> implementing metadata tracking would help to solve the problem that we
> have today. dpkg would know that all .deb have /bin as a directory and
> not as a symlink, and it would be able to conclude that the directory
> has been replaced by a symlink by something external, but that's it.
Let me put it subtly different. As we currently do not ship the aliasing
symbolic links in any data.tar, metadata tracking will not tell dpkg
about the aliasing and therefore metadata tracking cannot help resolve
the current situation (as singular measure). We can only add the
symbolic links to a data.tar after the aliasing has been resolved (see
Simon Richter's mails on how dpkg resolves directory vs symlink) and
thus metadata tracking can only help with resolving the situation after
we have fully resolved the situation. I don't see a way to resolve this
vicious circle and shall update the DEP17 text.
Helmut
Reply to: