[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#865769: Second data package including some machine-readable data



On Sun, 2017-06-25 at 16:13:39 -0700, Russ Allbery wrote:
> Guillem Jover <guillem@debian.org> writes:
> > On Sat, 2017-06-24 at 09:57:33 -0700, Russ Allbery wrote:
> >> - The list of archive sections and their descriptions
> 
> > I think this belongs on each archive providing those, alongside the
> > other archive metadata. And I'd rather see the involved parties
> > defining an appropriate file to provide so that any downloader which
> > has to fetch the matadata anyway would use instead of hardcoding it.
> 
> > Using a file from policy does not seem useful to me, because it would
> > mean software would need to depend on such policy provided package,
> > and if you are going to mix and match repos, you really need the
> > metadata from the archive you are pulling from.
> 
> > In addition the text in policy states that the canonical list is
> > maintained by the archive anyway. :)
> 
> I don't see how this would work.  The program would dynamically retrieve
> the list of sections every time it ran?  This seems like a bad idea, and
> even impossible in a lot of situations (off-line development work, for
> instance).

When I researched this at the time, there were two clear groups of
users for this information [U] (now summarized in [W]).

The first were package manager frontends and similar, which need to
fetch archive meta-data anyway, and they do not need to do that all
the time, as they tend to cache that. For this group using an out-of-band
file provided by a non-canonical package seems suboptimal, when the
information can be there alongside the rest of the metadata to
download. For example dselect is a prominent omission from that list,
one for which I'd rather not introduce the hardcoding and wait for
proper meta-data from the archives themselves, or make it Debian-specific
by having it depend on a Debian Policy specific file. :)

  [U] <https://lists.debian.org/debian-devel/2015/05/msg00287.html>
  [W] <https://wiki.debian.org/NewArchiveSections>

For off-line tools such as linters, syntax highlighers, and similar it
certainly seems like a problem to require fetching the data from the
archive. Although, in some cases relying on an external package that
might update the data outside of the control of the tool might be
undesirable, and it might be better to do like lintian is doing, and
refresh it as part of the release process.

Then I supposed there's a third group comprised of services. But those
I guess kind of fall somehow under the package manager frontends case,
as they need to fetch metadata information from the archive anyway(?).

> We maintain a list of archive sections in Policy anyway, so it's easy for
> us to provide this list in a machine-readable format as well.  (Well, we
> don't have the descriptions, but that's not hard to add and doesn't really
> add much additional maintenance work.)
> 
> I think it's fine that a debian-policy-data package only provide
> information for the Debian archive.  The same is also true of the virtual
> package names, of course; some other archive may have different virtual
> packages too.  Programs that want to work with various different package
> archives will need to know how to obtain this data from multiple sources.
> The intent is to provide a tiny package that others can easily depend on
> without much overhead.

Oh, I didn't mean to imply that Debian Policy should provide data or
support for other non-Debian archives.

My point is that perhaps it is not the best way to provide some of
this data in the first place, because:

  - it's not the canonical origin of the data,
  - having to fork the policy package just to amend the sections seems
    burdensome, when the latter change way less often than the former,
  - might make code having to support this data Debian-specific.

If we need an off-line replica of the data, it might perhaps make more
sense for the archive admins (ftp-masters in this case) to provide it,
in a similar way as we have a debian-archive-keyring. Of course they'd
need to agree to that first. :)

Barring that, having a single place to include this kind of information
in a uniform way, similar to what distro-info does, might be the second
best options.

But even then, if the least bad solution is to have debian-policy
provide the data, what I was trying to have at least taken into
account is that it would be nice to try to specify a somewhat neutral
hierarchical structure in the filesystem, and ideally a common file
format, so that ideally programs can just check for the vendor and
do the equivalent of something like:

   load /usr/share/distro-metadata/<vendor>-archive-metadata.<format>

instead of say, having debian-policy hardcoded therein or similar, so
you could just key on the vendor and be somewhat neutral.

> >> - The list of valid Debian control field names (by type of control file)
> 
> > This one, I'm uncertain, but I'd tend to think it is partly in a similar
> > situation to the previous one.
> 
> > For example dpkg contains already such a list (provably more
> > exhaustive) in Dpkg::Control::Fields, and I don't see making dpkg
> > depend on an external list, because dpkg is being used beyond Debian.
> 
> This was just an idle thought of mine, and maybe it doesn't solve any real
> problems.

I still think it's worth entertaining. :)

> > For the equivalent in policy I think I see where you are coming from,
> > and I think it would be nice to have most of policy in a declarative
> > format that could be used by linters, or some parsers, but if that means
> > it's going to make those somewhat Debian-specific it might not take
> > off.
> 
> I'm in general fine with the things provided by Debian Policy being
> Debian-specific.  That, in my opinion, is the point of the package.

Oh certainly, I think the Debian Policy is and needs to be very
Debian-specific (even though I guess it could also be made friendlier
towards derivatives :). But I'm looking at this from the other side,
because I think most of our downstreams and derivatives are helped when
we do not make our tools Debian-specific, but at most Debian-aware, by
for example supporting vendor profiles and similar.

At least that's something I try to do for any native package I
maintain! :)

> If some other distribution wants something equivalent, they can certainly
> fork Debian Policy or write their own separate document that supplements
> Debian Policy, and maintain corresponding data files.

I'm personally a fan of having derivative-specific policies, either as
forks of Debian Policy, as overlays on top of it or similar. But as
stated above, I think that setting this up so that having to fork is
not anymore such an optional thing, seems a bit cumbersome.

> > I don't think I have a direct use for any of the above anyway, but I
> > also think I'd prefer YAML, because it is more human readable. But not a
> > strong objection in any case.
> 
> I have a professional aversion to YAML because the security properties of
> YAML are so awful.
> 
> I wish everyone would just use TOML, but unfortunately it's not at a 1.0
> version yet and is not as widely supported by default as JSON is.

Ah heh, I was just about to send a reply clarifying that, indeed, if
I'd had to select something else than the two initial options, I might
consider TOML too! I've been exposed to it over the last year and I've
liked it way more than YAML. But I still find YAML more readable than
JSON, even if it has an insanely huge spec. :)

Thanks,
Guillem


Reply to: