[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

PROPOSAL: Extrafiles (was Re: Conffiles...)



Hello world,

On Tue, Apr 07, 1998 at 12:47:53PM -0500, Manoj Srivastava wrote:
> 	Instead, we should introduce a file called extra files to list
>  files that the package may generate in maintainer scripts or in
>  normal operations which are not listed in pkg.list. We may exclude
>  files under /var/run and /var/log entirely. 

As it turns out, this is a much more flexible solution for my purposes
anyway, so I've spent a little time over the past couple of days
trying to find a nice way of doing it.

So... I'd like to make a proposal to have policy changed so that
packages include this information as standard. I'm not really familiar
with how I'm meant to go about doing that, so I've probably left
something important out, or included stuff I shouldn't or something...
But anyway. Here goes.


Purpose 
=======

To provide a way for packages to claim ownership over files that
aren't included in their original package, eg files that are created
in a package's preinst or postinst, or that are manipulated by
programs in the package at runtime fall under this heading. Some
examples are:

	base-passwd: /etc/passwd, /etc/group, /etc/shadow, /etc/gshadow
	cron:        /var/run/crond.pid
	lintian:     /var/spool/lintian/...
	dpkg:        /var/lib/dpkg/info/*, /var/lib/dpkg/alternatives/*
	dhelp:       /usr/doc/.../.dhelp

and so on and so forth. 

(as it stands, things like `dpkg --search /etc/passwd' results in
`dpkg: /etc/passwd not found.' Personally I consider this alone a
little disconcerting)

Requirements 
============

1) A program (eg dpkg) should be able to easily
	a) tell if a given file is an extrafile (dpkg --search) 
	b) find all extrafiles of a given package (dpkg --listfiles)

2) Extrafiles should be able to specify:
	a) individual files
	b) files matching a pattern in a single directory
	c) files matching a pattern anywhere in a directory tree

Proposal
========

I'd thus like to propose the following changes to policy:

Debian Policy Manual
--------------------
Add a section 3.3.8: (renumbering 3.3.8 (Permissions and owners) to 3.3.9)

	       ----------------------------------------
3.3.8 Extra files 

Files that your package administers, but that are not included in the
package itself are called extra files. These include any configuration
files that are generated by a postinst, log files placed in /var/log,
files and directories added under /var/spool/, and more generally
anything that is only useful while your package is installed.

Such files should be listed in dpkg's extrafiles control area
file. (See the Debian Packaging Manual).
	       ----------------------------------------

Debian Packaging Manaul
-----------------------
Add a section 10: (renumbering sections 10..14 to 11..15)

	       ----------------------------------------
Debian packaging manual - chapter 10
Extra Files

dpkg includes an extrafiles mechanism for claiming ownership over
files that aren't distributed with the package, but are instead
created over the course of a package's presence on a system.

Files that should be specified in the extrafiles control area file
include such things as log files, spool directories and their
contents, configuration files (in particular those that aren't
conffiles -- see chapter 9).


10.1 Format of the extrafiles control area file 

The extrafiles file consists of one or more lines each containing a
single pattern matching files that will be used by the package. These
patterns are extended Bourne shell patterns, and have the following
special characters/sequences:

	*       Match any number of characters in a filename (but not
                a '/') 

	?	Match any character in a filename (but not a '/')

	[...]   Matches any of the enclosed characters. A pair of
		characters separated by a minus sign denotes a
		range. If the first character following the [ is a !
		or a ^ then any character not enclosed is matched. A -
		may be matched by including it as the first or last
		character in the set. A ] may be matched by including
		it as the first character in the set. ('/' cannot be
		matched)

	/**/	Match any number of subdirectory levels (0 or more)
	/**	Match any files in any subdirectories


Some example patterns include:

dhelp:	/var/lib/dhelp		; match the /var/lib/dhelp directory
	/var/lib/dhelp/**	; match anything underneath it
	/usr/doc/**/.dhelp	; match /usr/doc/.dhelp, 
				;  /usr/doc/*/.dhelp, 
				;  /usr/doc/*/*/.dhelp, and so on...

python:	/usr/lib/python*/**/*.py[co]
		; match any .pyc or .pyo files anywhere under
		; /usr/lib/python, /usr/lib/python1.4, or
		; /usr/lib/python1.5, or /usr/lib/python-anything
	       ----------------------------------------

Sample Programs
===============

After watching the recent du-files debate/flamewar/fiasco I thought I
probably ought to provide some sample programs that made use of the
extrafiles format I'm proposing. So:

filter_shell
------------
I've written a program "filter_shell" that expects on stdin a list of
filenames, and should be called with arguments such as:

	cat filenames | filter_shell /var/lib/dpkg/info/*.extrafiles

It then filters out all the filenames matching any of the shell
expressions in any of the extrafiles it's given, and lists the
remainder on stdout.

cruft
----- 
Cruft is the main reason I'm doing any of this. It's a bunch of
scripts that go through your hard drive, comparing what it expects to
find (based on /var/lib/dpkg/info/* and similar sorts of information),
to what it actually does find.

I've incorporated filter_shell into cruft, along with `extrafiles'
examples for 81 of the packages on my system. [0]

(happily, this both makes the results more accurate, *and* shaves a
minute or two off the time it takes to run. :)

For some idea of cruft's effectiveness, of the ~90,000 files on my
system it gives a list of around 250 files that it doesn't know about,
of which about half are (or appear to be) valid complaints. The
remainder are files I didn't have the patience to verify as being
things that should be on my system, and put into a sample extrafiles
list.

For some datapoints, 34 of the sample extrafiles I did up were one
line, 14 were two lines, 11 were three lines, 8 were four lines, and
only 14 had five lines or more. So hopefully this shouldn't require
too much work to add to existing packages.

dash-search
-----------
dash-search is a fairly simplistic demonstration of what might need to
be added to dpkg's --search option. Use

	dash-search full-path-to-file

to see if "full-path-to-file" is marked as an extrafile, and if so,
which package(s) it belongs to. 

NB: This won't match substrings; if you want to find who owns
/etc/cruft/explain/my_expl, you can't just type "my_expl". I'm not
sure what (if anything) can (or should) be done about this.

Availability
------------
cruft is available as http://va.debian.org/~ajt/cruft_0.9.1_i386.deb
It installs into /usr/sbin/cruft. Information on how it works is in
/usr/doc/cruft/README.

filter_shell is available in /usr/lib/cruft/filter_shell, as part of
cruft.

dash-search is available in the debian source for cruft available
under http://va.debian.org/~ajt/ .

Discussion Points
=================

There are a few things in the above that I'm not entirely comfortable
with.

1) Are shell patterns the right thing to use? They seem pretty
   pleasant to type up, but regexps do cover more cases, and there are
   lots of libraries with prewritten implementations of them. For Perl
   and Python people, regexp's would make life easier, for example.

   I went with shell expressions for two reasons. 

   First, I'm used to matching filenames with shell expressions,
   rather than regexps. Having /etc/xyzzy/.* would look more like a
   bunch of dot-files than everything in the directory, to me.

   The second thing that I was concerned about was that the regexp .*
   would match through directories, eg /etc/xyzzy/.* would match
   /etc/xyzzy/foo/bar as well as just /etc/xyzzy/foo and
   /etc/xyzzy/bar. This seemed like a loss, to me.  (if you wanted to
   just match one directory, and not subdirs, you'd have to use
   /etc/xyzzy/[^/]*).

   The only other program that does something similar I could think of
   was find(1)'s -path option, which doesn't assign any special
   meaning to '/', so /a*b/ will match /ab/, /a_and_b/, but also
   /alpha/deb/.

2) I haven't included the ability to escape special characters with a
   '\'.  If you'd like to include a single '[', '*', '?' you can just
   use [[], [*] or [?] respectively. 

   I'm inclined to disallow '\' alone, and require [\] if you want to
   use a '\', and possibly the same with ']', to make it fairly clear
   what an unusual pattern means. This would be via a Lintian check,
   rather than specific code. (iow, you *could* do it, but you
   *shouldn't*)

3) Should two non-conflicting .deb's be able to specify patterns that
   cover the same file? (eg /etc/a* and /etc/*b, which could both
   match /etc/ab) I don't think they should, personally.

   I *suspect* that it's possible to match two shell-expressions to
   see if they coincide, but I'm not sure. This would be useful both
   for dpkg --search (matching some pattern against `*[user's input]*'
   more or less), and a lintian check to ensure that no two packages
   claim ownership of the same files.

5) I've been considering whether extrafiles and conffiles would/should
   have any significant overlap, but personally I'm inclined to
   suspect that they probably don't.

6) I'm using the phrases "creates/uses file xyzzy" and "claims
   ownership over the file xyzzy" synonymously. I'm thinking, in
   particular, that it might be useful to extend dpkg/dselect/apt to
   make --purge delete all extrafiles too. I dunno.
 
And finally, some

Implementation Notes
====================
  
My implementation in filter_shell isn't particularly good. It's
something like O(n^2) instead of O(n) when parsing *'s, for
example. If someone with a little more experience parsing shell
expressions would like to tidy that up for me, I'd be appreciative. (I
tried looking at the bash source, but couldn't make head nor tail of
it)

I also haven't gotten rid of a few hard coded constants yet. Try not
to have .extrafiles longer than 1000 lines, or lines/filenames longer
than 1000 characters. :)

filter_shell.c currently accepts and discards '# ...\n' comments in
extrafiles and ignore whitespace at the start and end of lines.  This
doesn't conform to the above proposal. I'll change it later.

Acknowledgements
================

Thanks to Manoj for the initial suggestion and Guy for pointing out
what would need to happen to dpkg.

Comments, criticisms and even compliments all appreciated.

Cheers,
aj

[0] The extrafiles filters I've made are: (nb: they're not very
thorough, and some files have certainly been misclassified, but it's
still a start. These are included in the cruft.deb mentioned above)

] [aj@azure ~]$ ls /etc/cruft/filters/
] acct                    fvwm95                  ppp
] afterstep               gimp                    ppp-pam
] analog                  gnats                   python
] apache                  gpm                     qmail
] apt                     info                    qmail-src
] base-files              inn                     quake
] base-passwd             ircii                   run_utmp
] bind                    joe                     samba
] bsdgames                lastlog                 screen
] cfengine                ldso                    slrn
] cfingerd                lilo                    squid
] cron                    lintian                 ssh
] dhcpcd                  log_wtmp                suck
] dhcpd                   lprng                   sudo
] dhelp                   majordomo               suidmanager
] distributed-net         man-db                  sysklogd
] distributed-net-pproxy  man2html                syslog-summary
] doc-base                mc                      sysvinit
] dpkg                    mgetty                  tetex-base
] dpkg-mountable          modutils                wenglish
] dwww                    msqld                   wmaker
] emacs20                 ncurses-term            wu-ftpd-academ
] emacsen-common          net-acct                wwwcount
] equivs                  nethack                 xbase
] fetchmail               netstd                  xbill
] findutils               pdmenu                  xemacs
] fvwm2                   perl                    xntp3

-- 
Anthony Towns <aj@humbug.org.au> <http://azure.humbug.org.au/~aj/>
I don't speak for anyone save myself. PGP encrypted mail preferred.

      ``It's not a vision, or a fear. It's just a thought.''

Attachment: pgpXF5tmotEg_.pgp
Description: PGP signature


Reply to: