Re: RFC: Unified package metadata format
On Tue, 2017-03-28 at 16:22:58 -0700, Matthew Garrett wrote:
> I'm looking at implementing support for IMA file signatures inside
> dpkg. The previous patches posted for this
> (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=850340) did so
> using extended PAX metadata, but people didn't seem terribly
> enthusiastic about that.
> https://wiki.debian.org/Teams/Dpkg/Spec/MetadataTracking suggested
> mtree as a potential format, so I thought I'd try to kick off some
> discussion and see whether I'm missing any requirements or whether
> there were any better ideas. So:
As mentioned on IRC, I updated the wiki with some more thoughts.
Regarding PAX, it's my intention to extend and merge those patches
for 1.19.x, but as stated there, I'm not entirely sure that's the best
way (currently) to transport that metadata.
In any case, here are some more comments on the following:
> Debian package unified metadata format
> The file shall be stored within the control archive with the name
> “mtree” and shall start with the following string:
> #mtree v2.0
Also as mentioned on IRC some time ago, it seems this format didn't
quite catch up in general, and at least libarchive has now dropped the
magic value support from its code, NetBSD as the originator still
handles it though, and I think it still makes sense to mark the files.
One of my main references has been the NetBSD implementation as
the other being the libarchive one.
> Each entry shall be of the form
> /path/name key1=foo key2=bar
> Ie, a leading space, a slash, and the path name of the installed file
> followed by a series of space-separated key=value pairs followed by a
> line feed. […]
I don't think this is correct. Initial whitespace gets ignored (this is
not clear from mtree(5), but that's what the various implementations do).
The subset of type of lines I'm intending to support would be:
No indented nor continuation lines, no relative paths, no ".." entries.
Then, my idea would be to further distinguish two types of mtree files,
template and detailed. The first would allow the globs permitted in the
spec ('[', ']', '?', '*'), and possibly also the "ignore" keyword. The
second would not. Template mtree would be used in source packages, and
would be used to generate the data.tar in the .deb and possibly part of
the detailed mtree in the control.tar, and of course the detailed mtree
in the db.
> […] The following keys are supported (extracted from mtree(5)):
> * md5digest - a synonym for md5
> * sha1digest - a synonym for sha1
> * sha256digest - a synonym for sha256
On my WIP code, I've ignored these keywords, because they are just too
verbose, and I don't the see point with them. We are going to be
incompatible anyway with standard mtree(5), so… :)
> * gid - the file group as a numeric value
> * gname - the file group as a symbolic name
> * md5 - the MD5 message digest of the file
> * sha1 - the FIPS 160-1 (“SHA-1”) message digest of the file
> * sha256 - the FIPS 180-2 (“SHA-256”) message digest of the file
> * mode - the file’s permissions as a numeric (octal) value
> * uid - the file owner as a numeric value
> * uname - the file owner as a symbolic name
> * size - the size, in bytes, of the file
> * link - the file referenced by a symbolic link
> * type - The type of the file; may be set to any one of the follow:
> * block - block special device
> * char - character special device
> * dir - directory
> * fifo - fifo
> * file - regular file
> * link - symbolic link
> * socket - socket
I've also got "contents" to represent hardlinks, "time", "ignore" and
"optional" (but a "class" might make more sense, to be able to specify
the file as "class=conffile" or "log", "otional/ghost" and similar.
There's also "nlink" which I should probably drop as it does not make
much sense for dpkg's purposes.
> The following keys are supported but not present in mtree(5):
> * major - the major number of a device node
> * minor - the minor number of a device node
Actually some implementations define a "device" keyword, but it seems
a bit of a mess, given that the major/minor within are OS specific
anyway. So, my thinking was to probably ignore this one.
> * xattr.* - a base64-encoded extended attribute that will be
> associated with the file if the underlying filesystem supports
> extended attribute. The name of the attribute will follow the “xattr.”
> string - eg, “xattr.security.selinux=dW5jb25maW5lZF91Om9iamVjdF9yOnVzZXJfaG9tZV90OnMwAA==”
> would set the security.selinux extended attribute to
> unconfined_u:object_r:user_home_t:s0. This format is present in
This sounds good! I don't think all xattrs would need to be
base64-encoded, but over-encoding only takes more space so that's
always safe. :)
> * override.* - if present, will override the contents of a key
> applying to the same file. This may be used to apply local system
> policy and must not be present in shipped files.
I'm not sure I see the use for this?
> Outstanding questions:
There are some other outstanding questions in the wiki.
> * Should overrides instead be kept in a separate mtree.override file?
> This would avoid having to read the existing mtree file and merge its
> contents with the mtree file in the package before writing out
> /var/lib/dpkg/*.mtree. However, this advantage is reduced if
> per-package metadata is merged into a single file.
The idea IMO would be to replace the dpkg-statoverrides with an mtree
of its own, which would act as the overrides, but perhaps I'm
misunderstanding what you are referring with overrides here.
> * Are any other keys required? Should dpkg-divert be implemented using
> this format?
Hmm not sure, did you really mean dpkg-divert and not
> * Existing mtree implementations will either error or ignore
> additional keys, which limits interoperability. Is there another
> format that would make more sense than mtree?
At the time I was looking for candidates, I didn't really find many
other general purpose and sane manifest formats. With the mtree 2.0
extension, it seems to be an adequate format with good properties.
But if someone knows of a better one I'm happy to reconsider!
My current working plan is to get the last items for 1.18.x out,
ideally this week or the next. Then immediately branch 1.18.x and open
1.19.x on master. At which point I'd get the mtree support integrated
for the db as the first stage, and then we can start pondering about
how to transport additional metadata in the .deb as the second stage,
and finally about the templating mode. The last one might be more
involved as it will most probably require adding support for built-in
tar packing. But the second stage would allow to already test stuff
by manually crafting .debs or writing an external helper or tool to
inject that metadata, so this should allow us to experiment and not
block on the whole thing being finalized.