[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1020241: debian-policy: copyright-format: Formatting improvements/changes



Wouter Verhelst <wouter@debian.org> writes:
> On Sun, Sep 18, 2022 at 06:01:38PM -0700, Russ Allbery wrote:

>> Yes, we should distinguish between formatted text with synopsis and
>> formatted text without synopsis more clearly.  Or, you know, just
>> propose a new YAML format which would make it trivial to clean up all
>> of these problems *and* would provide first-class editor support and
>> easy parsing in every major programming language.  :)  But that's WAY
>> bigger than this bug.

> If we're going to do that, it might make sense to explicitly allow JSON
> and/or TOML as alternative representations, because there are some
> really weird edge cases in YAML.

I don't want to get too far into this since I don't think we're
realistically ready to propose such a thing, but I spent a bunch of time
researching this for my static web site generator, and YAML is the best of
a bunch of bad options for precisely the kind of structured data we want
to put in the copyright file.  While I agree that most of the YAML
specification was a mistake and I really wish someone would produce a
reduced spec, there's a sad shortage of other widely-implemented syntaxes
that are usable for files intended to be read and written directly by
humans, particularly ones containing large blocks of text.

JSON is definitely not it; JSON is a great computer interchange format,
but I tried to use it for something complex and human-editable and I would
never do that again.  Too much fiddling around with quotes and commas and
escaping, and for the copyright format, the obvious problem that JSON has
no way of representing blocks of text that isn't horrible.  (That said, if
the file is YAML, you are allowing JSON, because all JSON is valid YAML
since YAML allows JSON as alternative syntax.  Yes, yes, the specification
is a sprawling mess.)

TOML looked for a while like it was going to be the format I was hoping
for, but they implemented nested dictionaries in a way that makes the
format incredibly annoying to use for anything that has nested structure.
More relevant to the copyright file, while it at least has multiline
literal strings, they're awkward to read and write, particularly in nested
structure, compared to YAML.  If you don't want the leading whitespace of
indentation to be significant, you have to put a backslash at the end of
every line, which is really not okay.  And unfortunately TOML is
incompatible with YAML, unlike JSON, so to support both you have to embed
both libraries and select which one to use, which is annoying.

The killer features of YAML for the purposes of the copyright format are
the > and | symbols after a key, which let you write paragraphs of text
afterwards with normal structural indentation and full editor support for
wrapping and the like, and you can choose whether you want significant
whitespace.

YAML also has excellent implementations for basically every programming
language and editor, which is mostly true of JSON and kind of true of TOML
and stops being true after that.  Yes, the spec is a disaster, but since
other people have already implemented it for you, including with
protections against the more dubious stuff, you don't have to care as long
as you don't creep into the dark corners when writing files.

I understand Guillem's desire to not make dpkg depend on a YAML library,
and we should stick with deb822 with anything that is required at that
level, but the copyright format is optional and from dpkg's perspective is
just an opaque file, so that's probably not a blocker.

> I think semantic versioning would require you make this a major version
> bump, since like you say it's not backwards compatible.

Yup.

> Ah yes, but then if you do that, the old examples in policy that were
> being patched away here (usage of which might exist in the wild) would
> now have different semantics...

Yeah, it's a mess.

Maybe the right solution is to introduce a new Copyright-Lines field with
the semantics we want and deprecate Copyright.

-- 
Russ Allbery (rra@debian.org)              <https://www.eyrie.org/~eagle/>


Reply to: