[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Issues in the Patch Tagging Guidelines



Hi!

[ Started this some days ago and only finished it now, I see Jonathan has
  covered some parts of this. ]

On Thu, 2023-08-10 at 15:42:03 +0200, Lucas Nussbaum wrote:
> On 08/08/23 at 01:25 +0200, Guillem Jover wrote:
> > Lately I've been updating metadata in patches in packages I maintain and
> > noticed several issues with the Patch Tagging Guidelines, and after Lucas
> > created the new great patches UDD service [P] and we discussed some
> > other issues there, it looked like the guidelines could do with some
> > fixes and updates.
> > 
> >   [P] <https://udd.debian.org/patches.cgi> also part of DMD.
> > 
> > The following list are some of the issues or things that might deserve
> > to be clarified, fixed or updated (for reference the current guidelines
> > can be found at <https://dep.debian.net/deps/dep3>):
> 
> For context, there are several bugs about UDD's implementation of patch parsing:
> #1028503 UDD: Unknown "yes" value for Forwarded field in patch metadata
> #1031381 UDD/dmd: Incorrectly considers patches needing work
> #1043043 UDD patches: marks Forwarded as invalid if not 'no', 'not-needed', 'yes' or URL
> 
> The UDD parser of DEP3 metadata is
> https://salsa.debian.org/qa/udd/-/blob/master/rimporters/patches.rb#L163
> 
> The issues listed below fall in two categories:
> 1/ general issues about DEP3
> 2/ issues about computation of the "Forwarded" field value

> 1/ General issues about DEP3
> ============================
> 
> > * dpatch complicates parsing, it is deprecated, probably worth dropping
> >   support for it.
> 
> I believe this is solved: dpatch is no longer available in
> stable/testing/unstable, and we don't have any packages build-depending
> on it. There are still some patches that are dpatch-formatted but do not
> depend on dpatch-specific features, so this is not a big issue.
> 
> See:
> select source, patch from sources_patches inner join patches using (hash, file)
> where release='trixie' and headers ~* 'dpatch' order by 1,2;
> => 96 source packages, 295 patches

Yes, probably unclear, my point was to drop the dpatch support from
the Patch Guidelines, precisely due to this, and because otherwise a
strictly conforming parser would need to support it as it stands.

> > * Although the guidelines seem to intend for git formatted patches to be
> >   supported, the actual specification of the format is not very clear
> >   on its usage and a strict reading does not really allow it.
> > 
> >   - There is a requirement for the first field to appear on the first
> >     line, but git formatted patches start with «From ».
> 
> That's OK, since From is an alternative to Author, so it's a valid DEP3
> field.

Notice the space. Here's what a git format-patch patch looks like:

,---
From 9767e87d08803d78208464be3652a9231b6b08e3 Mon Sep 17 00:00:00 2001
From: Guillem Jover <guillem@hadrons.org>
Date: Thu, 10 Aug 2023 20:31:02 +0200
Subject: [PATCH] Patch description

Long explanation.
---
 test | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 test

diff --git a/test b/test
new file mode 100644
index 0000000..58bc469
--- /dev/null
+++ b/test
@@ -0,0 +1 @@
+Something.
-- 
2.40.1
`---

Which is not what the example looks like.

> >   - You cannot easily add custom Patch Guideline patches into the first
> >     git stanza, those need to go into the git trailers in the commit
> >     message, separated by whatever text is found in the description,
> >     which does not follow the deb822 format.
> 
> That's OK. You can have random text before the next DEP3 headers. See
> the first example in the DEP3 specification.

Right. I guess my core issue here is that the specification mixes git
format patches with something that is deb822 formatted, which seems
like a bad idea, because even though they are close they have different
formatting. I think ideally it would state that you have to use either
format, but not mix them?

> >   - Having to store the patch guideline fields in git commit trailer
> >     fields might mean these pollute the patch which might need to be
> >     removed before submitting upstream.
> 
> In practice, supporting both git-formatted patches and custom-formatted
> patches was a non-issue when writing a parser.

This is an issue for developers not parsers though.

> > * For Applied-Upstream it is not easy to predict what will be the
> >   future upstream version that the patch will be included in. I've opted
> >   for stuff like «3.2.1+, commit:<id>» when 3.2.1 is the last release,
> >   but that does not seem optimal.
> > 
> > * The git Date field could somehow be used in place of Last-Update (even
> >   though the Committer Date instead of the Author Date is what matches
> >   here more closely, but which is not available from «git format-patch»).
> 
> Yes, the UDD implementation does that.
> 
> > * Add new metadata to track single-debian-patch autogenerated patches,
> >   perhaps a new «Autogenerated: yes» or perhaps better something like
> >   «Origin: autogenerated, by:dpkg-source» (or similar descriptive thing)?
> >   These should in general not be warned as needing to be forwarded
> >   upstream, at least not as-is (dpkg-source in 1.22.0 will add a
> >   «Forwarded: not-needed» for these though).
> 
> A default of 'Forwarded: no' would be more appropriate, since some of
> individual changes could be relevant for upstream...

I do not maintain any package like this, but I think this would be
more annoying than helpful. Even if some parts have been forwarded,
that cannot be recorded in here, which would then become misleading.
The information might be present in the original split patches if
it's maintained in some VCS, but gets lost when applied. dpkg-source
adds a Description field mentioning this, perhaps it could also
reference the contents of any Vcs-* field as the possible source of
better metadata, but that seems redundant and can get out of sync,
maybe just a reference to debian/control would do. The patch header
injected by dpkg-source from git HEAD is currently:

  ,---
  Description: Autogenerated patch header for a single-debian-patch file.
   The delta against upstream is either kept as a single patch, or maintained
   in some VCS, and exported as a single patch instead of more manageable
   atomic patches.
  Forwarded: not-needed
  
  ---
  `---

> I was surprised to see that there are only 144 packages in testing using
> single-debian-patch, according to
>  select source, patch, headers
>  from sources_patches inner join patches using (hash, file)
>  where release='trixie' and patch ='debian-changes' order by 1 ,2;

Right, although given that these can also be consumed by upstreams or
other distributions (not based on Debian), when patch fishing and
similar, I think making this clear in the guidelines would be the
better option.

> > * The language could use some clarification and standardize on the field
> >   and stanza naming used in other parts of the deb822 ecosystem instead
> >   of headers and sets of headers and similar.
> 
> > * It is not clear, but I think «Origin: vendor» should be clarified to
> >   state that the actual vendor name should be included if there is no
> >   other reference (such as an URL) as in say «Origin: vendor, Debian»?
> >   Otherwise an undefined vendor is not really distinguishable from the
> >   «other» value as it could be any vendor. Also because if a «vendor»
> >   maintainer has created the patch then there might be no URL to point
> >   to except for the VCS it is kept in (if any at all).
> 
> 2/ issues about computation of the "Forwarded" field value
> ==========================================================
> 
> A clear status for each patch is useful to include that information in
> various dashboards (tracker.d.o etc.)
> 
> Unfortunately, the fact that the value for Forwarded is not necessarily
> explicit makes it hard to write a parser. It would be better if there
> was a simple way to determine:
> - if the patch was forwarded uptream (and how)
> - if the patch does not need to be forwarded upstream
> - if the patch still needs to be investigated
> 
> We could keep the current scheme of computing the real value of
> Forwarded based on other fields, but then I think that DEP3 should
> include pseudo-code to explain how to compute it.

One issue I see with always requiring a Forwarded field is that this
might imply being WET and thus possibly getting the information out of
sync. But perhaps the root problem is that multiple fields convey
related state, and it might be better to just have a single Status
field that conveys this information as it progresses over its various
states. For example I used something like that to track the status of
patches I sent to upstreams (before VCS and similar):

  https://git.hadrons.org/cgit/users/guillem/patches.git/tree/FORMAT

(Which generates https://hadrons.org/~guillem/patches.html).

One problem with a single field, is that it might lose some of the
status change history, but meh, as it stands, the semantics seem to be
duplicating information and encode it in multiple places in slightly
subtle different ways.

So perhaps «Status» could subsume, at least Applied-Upstream and
Forwarded or some of Origin. For example:

  Status: local (| Status: no-forward-needed ?)
  Status: not-sent
  Status: sent, commit:<id>|<url>
  Status: applied, commit:<id>|<url>
  Status: upstream, commit:<id>|<url>
  Status: backport, commit:<id>|<url>
  Status: fixed, commit:<id>|<url>
  Status: rejected, <url>

Or similar.

> I'm extremely biased of course, but I think that the UDD implementation could
> serve as a basis. :-)

I don't think UDD currently takes into account stuff like:

  Origin: upstream, commit:<id>
  Origin: backport, commit:<id>
  Applied-Upstream: <version>, commit:<id>

though, or variants with URLs, which clearly denote that forwarding is
not needed or has already been done in some way (from #1031381 above).

> (And I would be happy to update the UDD implementation to match an updated DEP3).
> 
> > * Forwarded does not support recording it being sent to an email address.
> >   Not all upstreams have a public mailing list or web site to file reports
> >   or send comments to, and it might be worth allowing a mailto: reference,
> >   or simply an email address.
> > 
> > * Forwarded lists "yes" as a valid implicit value, but does not make it
> >   clear whether an explicit one should be supported. If a mailto: or
> >   email is accepted then this is probably less of an issue. I've also
> >   used this value when I have sent the patch upstream and it has been
> >   applied before I have gotten around to updating the patch metadata,
> >   but I guess at that point not providing the field would also be
> >   fine.
> > 
> > * Forwarded fuzzy specification means parsing its values is rather hard,
> >   see UDD <https://lists.debian.org/debian-qa/2023/01/msg00022.html>.
> >   It would be better to be strict here. In general choosing to be
> >   fuzzy about the whole specification with the intent to help humans,
> >   I think means that programs have a hard time (see UDD above, and
> >   lintian, where both disagree on semantics) and even humans can get
> >   more confused when crafting or parsing them.
> 
> So I think that the next step would be for someone to start a draft to
> update DEP3. I might do it myself at some point, but would be very happy
> if someone else did it.

I'd like to see opinions on some of these before starting to draft
stuff, but perhaps making some of these proposals more concrete with
wording might be easier to review or discuss?

Thanks,
Guillem


Reply to: