Hello world, On Tue, Apr 07, 1998 at 12:47:53PM -0500, Manoj Srivastava wrote: > Instead, we should introduce a file called extra files to list > files that the package may generate in maintainer scripts or in > normal operations which are not listed in pkg.list. We may exclude > files under /var/run and /var/log entirely. As it turns out, this is a much more flexible solution for my purposes anyway, so I've spent a little time over the past couple of days trying to find a nice way of doing it. So... I'd like to make a proposal to have policy changed so that packages include this information as standard. I'm not really familiar with how I'm meant to go about doing that, so I've probably left something important out, or included stuff I shouldn't or something... But anyway. Here goes. Purpose ======= To provide a way for packages to claim ownership over files that aren't included in their original package, eg files that are created in a package's preinst or postinst, or that are manipulated by programs in the package at runtime fall under this heading. Some examples are: base-passwd: /etc/passwd, /etc/group, /etc/shadow, /etc/gshadow cron: /var/run/crond.pid lintian: /var/spool/lintian/... dpkg: /var/lib/dpkg/info/*, /var/lib/dpkg/alternatives/* dhelp: /usr/doc/.../.dhelp and so on and so forth. (as it stands, things like `dpkg --search /etc/passwd' results in `dpkg: /etc/passwd not found.' Personally I consider this alone a little disconcerting) Requirements ============ 1) A program (eg dpkg) should be able to easily a) tell if a given file is an extrafile (dpkg --search) b) find all extrafiles of a given package (dpkg --listfiles) 2) Extrafiles should be able to specify: a) individual files b) files matching a pattern in a single directory c) files matching a pattern anywhere in a directory tree Proposal ======== I'd thus like to propose the following changes to policy: Debian Policy Manual -------------------- Add a section 3.3.8: (renumbering 3.3.8 (Permissions and owners) to 3.3.9) ---------------------------------------- 3.3.8 Extra files Files that your package administers, but that are not included in the package itself are called extra files. These include any configuration files that are generated by a postinst, log files placed in /var/log, files and directories added under /var/spool/, and more generally anything that is only useful while your package is installed. Such files should be listed in dpkg's extrafiles control area file. (See the Debian Packaging Manual). ---------------------------------------- Debian Packaging Manaul ----------------------- Add a section 10: (renumbering sections 10..14 to 11..15) ---------------------------------------- Debian packaging manual - chapter 10 Extra Files dpkg includes an extrafiles mechanism for claiming ownership over files that aren't distributed with the package, but are instead created over the course of a package's presence on a system. Files that should be specified in the extrafiles control area file include such things as log files, spool directories and their contents, configuration files (in particular those that aren't conffiles -- see chapter 9). 10.1 Format of the extrafiles control area file The extrafiles file consists of one or more lines each containing a single pattern matching files that will be used by the package. These patterns are extended Bourne shell patterns, and have the following special characters/sequences: * Match any number of characters in a filename (but not a '/') ? Match any character in a filename (but not a '/') [...] Matches any of the enclosed characters. A pair of characters separated by a minus sign denotes a range. If the first character following the [ is a ! or a ^ then any character not enclosed is matched. A - may be matched by including it as the first or last character in the set. A ] may be matched by including it as the first character in the set. ('/' cannot be matched) /**/ Match any number of subdirectory levels (0 or more) /** Match any files in any subdirectories Some example patterns include: dhelp: /var/lib/dhelp ; match the /var/lib/dhelp directory /var/lib/dhelp/** ; match anything underneath it /usr/doc/**/.dhelp ; match /usr/doc/.dhelp, ; /usr/doc/*/.dhelp, ; /usr/doc/*/*/.dhelp, and so on... python: /usr/lib/python*/**/*.py[co] ; match any .pyc or .pyo files anywhere under ; /usr/lib/python, /usr/lib/python1.4, or ; /usr/lib/python1.5, or /usr/lib/python-anything ---------------------------------------- Sample Programs =============== After watching the recent du-files debate/flamewar/fiasco I thought I probably ought to provide some sample programs that made use of the extrafiles format I'm proposing. So: filter_shell ------------ I've written a program "filter_shell" that expects on stdin a list of filenames, and should be called with arguments such as: cat filenames | filter_shell /var/lib/dpkg/info/*.extrafiles It then filters out all the filenames matching any of the shell expressions in any of the extrafiles it's given, and lists the remainder on stdout. cruft ----- Cruft is the main reason I'm doing any of this. It's a bunch of scripts that go through your hard drive, comparing what it expects to find (based on /var/lib/dpkg/info/* and similar sorts of information), to what it actually does find. I've incorporated filter_shell into cruft, along with `extrafiles' examples for 81 of the packages on my system.  (happily, this both makes the results more accurate, *and* shaves a minute or two off the time it takes to run. :) For some idea of cruft's effectiveness, of the ~90,000 files on my system it gives a list of around 250 files that it doesn't know about, of which about half are (or appear to be) valid complaints. The remainder are files I didn't have the patience to verify as being things that should be on my system, and put into a sample extrafiles list. For some datapoints, 34 of the sample extrafiles I did up were one line, 14 were two lines, 11 were three lines, 8 were four lines, and only 14 had five lines or more. So hopefully this shouldn't require too much work to add to existing packages. dash-search ----------- dash-search is a fairly simplistic demonstration of what might need to be added to dpkg's --search option. Use dash-search full-path-to-file to see if "full-path-to-file" is marked as an extrafile, and if so, which package(s) it belongs to. NB: This won't match substrings; if you want to find who owns /etc/cruft/explain/my_expl, you can't just type "my_expl". I'm not sure what (if anything) can (or should) be done about this. Availability ------------ cruft is available as http://va.debian.org/~ajt/cruft_0.9.1_i386.deb It installs into /usr/sbin/cruft. Information on how it works is in /usr/doc/cruft/README. filter_shell is available in /usr/lib/cruft/filter_shell, as part of cruft. dash-search is available in the debian source for cruft available under http://va.debian.org/~ajt/ . Discussion Points ================= There are a few things in the above that I'm not entirely comfortable with. 1) Are shell patterns the right thing to use? They seem pretty pleasant to type up, but regexps do cover more cases, and there are lots of libraries with prewritten implementations of them. For Perl and Python people, regexp's would make life easier, for example. I went with shell expressions for two reasons. First, I'm used to matching filenames with shell expressions, rather than regexps. Having /etc/xyzzy/.* would look more like a bunch of dot-files than everything in the directory, to me. The second thing that I was concerned about was that the regexp .* would match through directories, eg /etc/xyzzy/.* would match /etc/xyzzy/foo/bar as well as just /etc/xyzzy/foo and /etc/xyzzy/bar. This seemed like a loss, to me. (if you wanted to just match one directory, and not subdirs, you'd have to use /etc/xyzzy/[^/]*). The only other program that does something similar I could think of was find(1)'s -path option, which doesn't assign any special meaning to '/', so /a*b/ will match /ab/, /a_and_b/, but also /alpha/deb/. 2) I haven't included the ability to escape special characters with a '\'. If you'd like to include a single '[', '*', '?' you can just use [, [*] or [?] respectively. I'm inclined to disallow '\' alone, and require [\] if you want to use a '\', and possibly the same with ']', to make it fairly clear what an unusual pattern means. This would be via a Lintian check, rather than specific code. (iow, you *could* do it, but you *shouldn't*) 3) Should two non-conflicting .deb's be able to specify patterns that cover the same file? (eg /etc/a* and /etc/*b, which could both match /etc/ab) I don't think they should, personally. I *suspect* that it's possible to match two shell-expressions to see if they coincide, but I'm not sure. This would be useful both for dpkg --search (matching some pattern against `*[user's input]*' more or less), and a lintian check to ensure that no two packages claim ownership of the same files. 5) I've been considering whether extrafiles and conffiles would/should have any significant overlap, but personally I'm inclined to suspect that they probably don't. 6) I'm using the phrases "creates/uses file xyzzy" and "claims ownership over the file xyzzy" synonymously. I'm thinking, in particular, that it might be useful to extend dpkg/dselect/apt to make --purge delete all extrafiles too. I dunno. And finally, some Implementation Notes ==================== My implementation in filter_shell isn't particularly good. It's something like O(n^2) instead of O(n) when parsing *'s, for example. If someone with a little more experience parsing shell expressions would like to tidy that up for me, I'd be appreciative. (I tried looking at the bash source, but couldn't make head nor tail of it) I also haven't gotten rid of a few hard coded constants yet. Try not to have .extrafiles longer than 1000 lines, or lines/filenames longer than 1000 characters. :) filter_shell.c currently accepts and discards '# ...\n' comments in extrafiles and ignore whitespace at the start and end of lines. This doesn't conform to the above proposal. I'll change it later. Acknowledgements ================ Thanks to Manoj for the initial suggestion and Guy for pointing out what would need to happen to dpkg. Comments, criticisms and even compliments all appreciated. Cheers, aj  The extrafiles filters I've made are: (nb: they're not very thorough, and some files have certainly been misclassified, but it's still a start. These are included in the cruft.deb mentioned above) ] [aj@azure ~]$ ls /etc/cruft/filters/ ] acct fvwm95 ppp ] afterstep gimp ppp-pam ] analog gnats python ] apache gpm qmail ] apt info qmail-src ] base-files inn quake ] base-passwd ircii run_utmp ] bind joe samba ] bsdgames lastlog screen ] cfengine ldso slrn ] cfingerd lilo squid ] cron lintian ssh ] dhcpcd log_wtmp suck ] dhcpd lprng sudo ] dhelp majordomo suidmanager ] distributed-net man-db sysklogd ] distributed-net-pproxy man2html syslog-summary ] doc-base mc sysvinit ] dpkg mgetty tetex-base ] dpkg-mountable modutils wenglish ] dwww msqld wmaker ] emacs20 ncurses-term wu-ftpd-academ ] emacsen-common net-acct wwwcount ] equivs nethack xbase ] fetchmail netstd xbill ] findutils pdmenu xemacs ] fvwm2 perl xntp3 -- Anthony Towns <email@example.com> <http://azure.humbug.org.au/~aj/> I don't speak for anyone save myself. PGP encrypted mail preferred. ``It's not a vision, or a fear. It's just a thought.''
Description: PGP signature