Proposal: Merge the cruft database with dpkg
(note: in this email I'm using the name 'dpkg' loosely. I just mean 'the
debian packaging system')
Lately I made a big effort to update my filters files for cruft and
the generated log went from many hundreds of KB down to just under 5KB,
i.e. less than a 100 lines (filter files available on demand). And I
strongly believe that even with its current limitations cruft is a very
important tool in the debian arsenal:
* as an auditing tool. In addition to find cruft left over as you
upgrade/remove packages, cruft allows you to find files that you
added to your system and .dist-xxx files that you did not merge yet.
* as a way to find package bugs. Some of the cruft accumulates because
packages forget to remove dynamically generated files, some because
alternatives don't get updated properly...
* And another big part of the cruft is simply caused by packages that
create directories and files in their install script but don't say a
word about them to dpkg. I believe that forcing a package to declare
exactly which files/directory it may create is a good thing (yes, you
need regular expressions for cases like /var/spool/squid/**).
There's a number of big problems with cruft today:
1. I won't talk about the scanning of /proc, windows partitions or
anything like that. These are problems too but today my point is
elsewhere entirely.
2. It's very out of date and incomplete. The filters file no longer
match what current packages install on the system. It's missing many new
packages.
3. It's not maintainable. We cannot expect the cruft maintainer to
monitor and update filter files each time one of the 6700+ debian
packages is updated, each time a package is added, or removed.
4. cruft makes use of all the files it finds in /usr/lib/cruft/filters,
even if that package is not installed... or has been removed and has
left cruft behind.
5. There is no way to make sure the files in /usr/lib/cruft/filters
match the version of the packages that are installed on your system. For
instance logrotate recently moved /var/run/logrotate/status to
/var/lib/logrotate/status (IIRC). Which one should you put in
/usr/lib/cruft/filters/logrotate then?
6. Who do you report bugs to when you find a problem in one of the
/usr/lib/cruft/filters files? to the cruft maintainer? But he may very
well know nothing about the package that has this stray file. How is he
supposed to tell whether this file should be there or not? For instance,
is it normal to have a file called '/etc/openldap/ldap.conf' when ldap
is installed? (I have both '/etc/openldap' and '/etc/ldap' on my system)
So how does one fix these problems. There are many ad-hoc ways but I
think that the best long-term way is to integrate the information stored
in /usr/lib/cruft/filters with dpkg's file database. This way:
1. dpkg -S works as one expects. Currently 'dpkg -S /etc/syslog.conf'
will tell you that the file belongs to syslogd. But who created
/etc/porttime is anybody's guess.
2. you would need a way to store file patterns like
'/var/spool/squid/**'. It might even be better if it were possible to
use real regular expressions because they are more flexible.
'/var/spool/squid/[0-9A-F]{2}/**' could be replaced with the more fine
grained '/var/spool/squid/netdb_state', '/var/spool/squid/swap.state',
'/var/spool/squid/swap.state.last-clean'. This would allow you to detect
that squid forgot to delete the old 'swap.state' when the upgrade
replaced it with 'cache.state' (fictuous example).
3. the information used by cruft matches exactly the set of packages
installed on the system, down to their version.
4. Each package maintainer maintains the list of files that the package
may install. This distributes the load and makes the content of these
files more accurate.
5. If a package creates files it's not supposed to, then you report the
bug against that package (if you find it, otherwise ask on wine-user/dev
and then report the bug).
6. You would also want to add a property to indicate that some files
may not be present. This would prevent cruft from complaining these
files are 'missing'.
7. Another useful property would declare files that are not deleted
during an upgrade cycle. IIRC, during an upgrade of squid, dpkg
complains that '/var/spool/squid' is not empty and therefore cannot be
deleted. Of course it's not empty! You don't want to purge your cache
each time you upgrade squid. By declaring the contents of
/var/spool/squid as 'persistent data' we would avoid this warning.
Of course the above requires modifications in dpkg and a change in
policy so it will take time. So I also propose a medium term approach
which would consist in having the packages install their
/usr/lib/cruft/filter/xxx file themselves. This would make cruft
maintainable again and ensure that these files are in sync with the
packages installed on the system. It still require a policy update but
it can probably be started right away (at least for packages that don't
have a file in cruft yet).
The short term is for me to submit my 68 new filter files, 18 updated
files, and to find out what to do with about 100 files lying around on
my system. Suggestions on how to proceed are welcome too.
One last note: logcheck has pretty much the same problem as cruft...
except that fewer packages write stuff to syslog.
--
Francois Gouget fgouget@free.fr http://fgouget.free.fr/
We are Pentium of Borg. You will be approximated. Division is futile.
Reply to: