Re: Any volunteers for lintian co-maintenance?

To: Andreas Tille <andreas@an3as.eu>, Debian Developers <debian-devel@lists.debian.org>
Subject: Re: Any volunteers for lintian co-maintenance?
From: Niels Thykier <niels@thykier.net>
Date: Tue, 21 May 2024 15:23:24 +0200
Message-id: <[🔎] 458d6094-4b07-4d9a-8c9b-e5578d29298d@thykier.net>
In-reply-to: <[🔎] ZknZOQe9M865Q5Qw@an3as.eu>
References: <ZjozuQzAE9X-G2fY@an3as.eu> <87735384-ffc1-4f4f-a978-cbf8104adc47@thykier.net> <[🔎] Zj0jtUBSWTd9usAw@an3as.eu> <[🔎] ZknZOQe9M865Q5Qw@an3as.eu>

Andreas Tille:

Hi Niels,

at first sorry for my late answer.

At Thu, May 09, 2024 Niels Thykier wrote:
[...] >> For me, lintian fails in all roles it has. It is not a good tool for

newbies

to get help, since it can only test build artifacts. As an example, your
feedback look is a full package build followed by unpacking the package just
so lintian can tell you have a typo on line 4. That is a massive waste of
resources - notably developer time and mental bandwidth.


I understand your point about having a tool that checks the debian/ dir for
issues like spelling errors, binary files in the upstream source, and other
concerns right within the packaging tree before the build starts. However, I
don't understand why you mention newbies in this context.

My core argument is the feedback cycle is excruciatingly complicated andslow compared to what it needs to be for validation of "debian/*" files.In my view, the problem is amplified for newcomers in multiple areas.

[...]
As a consequence,
people now get auto-rejects when uploading because lintian on the FTP master
server does not produce the same output as current lintian in stable or
newer.


I think its a bit unfair to blame lintian about the fact that its old
versions do not do a proper job when it comes to checking newer packages.

Is it now? When I maintained lintian, I was of the understanding thatthe dak usage was an explicit use-case we, as lintian maintainers, wereexpected to support. In my time, I would have considered this situationas an RC bug against lintian if I had this change and the FTP masterswere unable or unwilling to install the -backports version of lintian.

On the other side of the "unfairness" coin, I feel it is unfair to havepeople spend volunteer time being stuck in a painful cycle of "It workson my machine, but dak rejects it because lintian is not updated on theFTP masters machine" for which they are expected to ignore lintianwarnings locally to get out of (you need overrides in the old format,which the new lintian then complains about - damned if you, damned ifyou don't). Those are volunteers that wasted their Debian time beingcauhgt between lintian and dak and, in my book, that was much moreunfair than having lintian (or dak) and its maintainers own up to it.

I feel we, as a distribution, should ensure such problems do not happen.As stated, in my time as a lintian maintainer, I felt the responsibilitywas with lintian and that is why I blame lintian.

Maybe times have changed here and we, as a distribution, no longer holdlintian accountable here. Not sure who is then, but maybe that is partof why this problem has existed for so long.

(For the record, I think the ship sailed on this one. I am not expectingAlex to go retroactively fix this problem on the lintian side. I expectus not to repeat this mistake again)

[...]
Especially for the editor support
related parts, where people get instant feedback both on issues and the fix,
automatic reformatting on save and completion suggestions. None of which
lintian or wrap-and-sort are capable of.


If you ask me personally I'm absolutely happy about a policy checker that
simply reports issues.  I'm fine with firing up an editor in some other
terminal and be done.  Maybe I'm missing your point but for me that's a
non-issue.  Or is your comparison with wrap-and-sort rather targeting at
some tool that automatically fixes the issues it has found and I can check
the changes afterwards with `git diff`?  Or something like the janitor tools
that even commit changes?

I feel my point is not coming across at all and that is frustrating me abit.

Imagine you need to change `debian/control` for some reason regardlessof the situation that triggered this. You open up your editor and do thechange. In the process, you make a mistake.


The current workflow is:

 1) Edit file (introducing mistake)
 2) No feedback in the editor, so:
    a) You save the file
    b) Build an artifact that lintian can check
    c) Run lintian to get the feedback
 3) You correct the mistake.
 4) Rinse and repeat all the sub-steps of 2) to validate there are no
    mistakes.

This is the workflow you have today with lintian. And it applies equallyto all kinds of mistakes from policy violations, to textual or semantictypos.

Now, I would like you to step away from the status quo. What thisworkflow *should* have been in my view is:


 1) Edit file (introducing mistake)
 2) Editor shows a "Here is a mistake"-marker.
 3) You correct the mistake (either manually or via a quick fix)
 4) Editor removes a "Here is a mistake"-marker.
 5) Save the file

Notice here that I do not need to leave my editor to get feedback. I getit automatically, so I cannot forget it nor am I inclined to skip thecheck in a hurry. This is the crux of my problem with status-quofeedback loop. I have *actively* ask for feedback. I have to wait for ittoo which becomes paper cut.These are unnecessary a mental burden and paper cuts for aconsiderable part of problems you can introduce via editing `debian/*`files. IDEs have solved this problem very well via their near instantfeedback loops. I feel we are long overdue for that.



Similarly, when you consider the reformatting flow of today, the flow is:

 1) Edit file
 2) Save file.
 3) Run `wrap-and-sort` to reset formatting.
    - Where I, by the way, have to manually pass the correct formatting
      options.

In the workflow I want, the cycle is:

 1) Edit file
 2) Save file, which causes the editor reformat automatically *).

Here; I do not have to remember to reformat the file. The editor does itfor me. It is automatically correct rather than correction due to activemanual labor on my part.

Obviously, the status quo workflow is possible. We have been doing itfor years. However, we should not make a human do the work of a machine.Make the machine do what it does best; follow the same procedure everytime. This enables us to free up mental bandwidth of our humanvolunteers for other things.

*) For packages that have opted in to automatic styling, since this isnot a mandatory thing. Stating this explicitly to avoid the conversationdetailing into a question of this being imposed.

[...]
But even if I am not successful with
`debputy`, I cannot imagine I would consider returning to lintian. It does
not scratch my itch and years of issues (some of which are still unfixed)
have made me not want to have anything to do with the tool.


[...]

Given your very interesting input we actually need people who are able to
dedicate quite some time on restructuring lintian in a way that respects the
fact that some checks can be done / are done by some other tool on source
level.  Alternatively lintian itself could be modularised to rather do what
you want.

Both in-editor feedback and the "debian files of an unpacked sourcetree" are the parts I am trying to cover with `debputy` (via `debputylsp server` + `debputy lint/reformat` respectively)

I do not see lintian expanding to in-editor feedback. It is a massiveundertaking in its own. Given no one have solved the "run lintian on anunpacked source tree" yet, which would be a prerequisite and also aconsiderable undertaking on top, I doubt we will ever see it. I also donot see any note worthy benefit of attempting direct code reuse fromlintian at this step.



When you work on in-editor feedback, you will need at least:

 1) A lenient parser that keeps track of all sorts of things like
    syntax errors, white space, and comments that is usually the first
    thing your parser throws away to keep things simple. Ideally, it
    also:
    - supports reading a string or a line of lines, since the editor
      content are not always persisted to the file system. Instead, you
      get it from "somewhere else" (fed via socket in the LSP case)
    - continues after syntax errors, since otherwise you only get one
      error on syntax errors and most other feedback disappears, which
      can be annoying to the user. Especially important for completion
      since the half-finished typing might be syntactically be invalid.
      (Also, inserting a field in a deb822 stanza will temporarily split
       the stanza into two where at least one of them will definitely
       be invalid. You will want to be able to compute the completion
       as-if the stanzas are not split despite the file being
       "stanza, empty line / syntax error, stanza")

 2) Additionally, you need to know file ranges of everything. One thing
    is identifying that the `foriegn` value in the `Multi-Arch` field
    was a typo of `foreign`. But for editor support, you have to tell
    the editor where to put the marker. That range is different in all
    of the cases below:

      Multi-Arch: foriegn

      Multi-Arch:foriegn

      Multi-Arch:
      # Comment for the sake of the argument; probably breaks
       foriegn

    In all cases, the marker should be on the `foriegn` work because
    that is where the mistake. If you are lucky, you get the line number
    where `Multi-Arch:` appears and then you get retrace things
    manually. That gets even more complicated for non-string types or
    where parser "cleans" up things for you. As an example, with most
    deb822 parsers, it is hard to tell `Multi-Arch:foreign` apart from
    `Multi-Arch: foreign`, since the white space is to be trimmed in
    that particular case.

    Note ranges goes two ways. For diagnostics (linting), you tell the
    editor where the marker goes. For completion and hover docs, the
    editor tells you where the user is and you have to figure out what
    is at that point (file "debian/control, line 22, column 14"). This
    means you need a two-way mapping between content and position.
    Here, lintian only does one way mapping, and it only does basic
    positioning (like line or line + column). For code reuse, it would
    have to do full range of issues.

 3) You will need a lot of extra metadata that no one else will need.
    As an example, a simple linter might get away with knowing that
    "Multi-Arch" is a known field and has 4 allowed values. A complex
    one would know about 4 values with one of them being conditional
    on the Architecture field (which is less trivial to share in
    data-only format). If you do an on-line editor feature with:

    - hover docs, then you need the main documentation you want to show
      the user for the field and each of the values (depending on what
      the user requests docs for). Hover docs are partially static and
      partially dynamic data, which makes general purpose sharing of
      this data less trivial.

    - completion, then you may want to have a one-liner documentation
      for the values. Maybe some sorting hints to the editor, so it
      knows it should de-emphasis "allowed". Additionally, you want
      to track whether the values you offer are allowed in this context
      (which for Multi-Arch means checking the `Architecture` field,
      while for `Protected` it is static metadata that `no` is the
      default and the default would trigger a warning.)

    - In all of the above cases, you also want fields / data about
      things you cannot check. A linter does not need to know about
      all fields it cannot check (other than maybe for field name
      canonicalization purposes, a.k.a. "cute-field"). In the editor
      support, every known field is now also part of the completion
      "vocabulary" and hover docs may still be useful.

 4) Mentally to structure your work will be built around the user
    interacting with the editor. That is, you will be forced into an
    event driven architecture. Latency is visible to the user and will
    annoy them. A full second is a long wait at this point.

    Related, the user typing is sometimes multiple events because the
    user happened to type a bit too slow or maybe they stopped typing
    midway. So you want support for stopping long running diagnostics,
    so you do not build up a queue of pending but now irrelevant
    diagnostics.

    Lintian, for comparison, is entirely in a batch driven architecture,
    where latency of most steps was never important.

This is beyond the particular "idiosyncrasies" of how the LSPspecification and tracking what the editor supports, when to providewhat information to the editor, etc.

I can tell you with absolute certainty that lintian is ready forbasically none of the above. It was not built for it and parts of thisare an absolute pain to do. You do that because you have to do it towork with the editor support, not to support another project while youare already drowning in work trying to keep the project afloat.

Additionally, for a linter (hammer), every thing is a diagnostic (nail).For an editor integration, you have a more varied toolbox. As anexample, `debputy` does not emit diagnostics for trailing white spacelike lintian does (with `--pedantic` as I recall). Instead, `debputy`fixes them automatically on saving where relevant. Because that is abetter solution for the user when you are not forced to solve everythinglike a linter (hammer).Accordingly, even if it was possible to share all the lintian code, Iwould not want all of it meaning that lintian would now need conditionsfor "things `debputy` wants vs. things `debputy` does not". Again, notthe thing you need trying to keep your project afloat.

[...]PS: In my view, the bleeding of lintian's quality started long before Axel

joined the lintian maintenance team and I do not fault Axel for being unable
to stop the bleeding. In my view, only a hero could have "managed" that at
the expense of their mental health.


Thanks a lot for your mental support to Axel which I confirm from my side.

To draw some conclusion out of the discussion:  We need to enhance the way
we are checking our packages for conformance with our policy.  You made
clear that quite a part can be done at source level.  I'm not fully sure
whether your main focus is on the time inside the build process or the
editing features you mentioned.

The `debputy` framework has two different "legs" here. One is thein-editor feedback with some batch counter parts for CI pipelines, whichaims to be generally applicable to all packages.

The other leg is `debputy` self-checking the packaging instructions forpackages built with `debputy`. In a sense, this also counts as policychecking but it is not a static analysis and therefore is not comparableto lintian.

 It is also not clear to me whether you are
questioning the general architecture like for instance the rule sets that
are in /usr/share/lintian/data.  IMHO this is a valuable set of rules that
can be used by alternative tools as well.  Do you agree with this or not?

I find that data to be of questionable value to my work at the presenttime or other tools in this area:


 1) I do not remember lintian every committing to these being part of
    its API. Indeed, I see some files that have changed format since
    my time there and they often also engineered to fit lintian specific
    needs rather than being general purpose data files.

 2) A large part of the files would not be relevant to my work since
    I am not looking at upstream code or packaged artifacts.

 3) In my work, I would need a lot extra auxiliary metadata that lintian
    will not need (per my remarks above on doing your own editor
    integration).

Obviously, there could be value in sharing rules, data and metadata ofthis kind with other interested projects. Jelmer and I already discussedthis possibility in relation to `lintian-brush`. However, it is notsomething solved by simply declaring `/usr/share/lintian/data` as stableAPI. Instead, I would rather extract subsets of it into a generalpurpose data package as needed.

Ideally one where we can release the data faster than checkers, so we donot get the annoying effort that a new debian-policy upload triggers ourstatic analysis tools being out of date for weeks or even months.

Side-bar: This debian-policy problem is one reason why `debputy` doesnot flag "newer-standards-version" as a problem (only older). I do notwant to repeat this problem in `debputy`.It is a trade-off, because a typo could make the version too new bymistake and that would be silent in `debputy` at the moment. So I amdefinitely interested in outsourcing part of the data.

As I wrote in my other mail in this thread[1] I could imagine some policy
checker step after dh_clean.  When thinking twice about it another step
could be done before dh_builddeb which could detect lots of issues before
the package is built and can save the unpackaging step.  Are you targeting
at this as well?
 > Kind regards and thanks a lot for your inspiring input
     Andreas.

[1] https://lists.debian.org/debian-devel/2024/05/msg00162.html

No, I am not targeting this for `debhelper`. If you build a package with`debputy` instead of `debhelper`, there are some built-in self-checks ofthe provided packaging instructions compared to the "about to beproduced"-package. It is conceptually similar to `dh_install` erroringout when you reference `usr/bin/foo` and `dh_install` cannot find said file.

It would not be difficult to add some form of policy checking layer ontop of this, though the question is what we want to check at this pointwhere the helper should not just fix it instead. If the tool can fix it,then it is better than "here is a problem for you to read up on and thenfix manually even though there was only one obvious solution". One thingrequires brain-cells, the other does not.

My end goal with `debputy` is that the average contributor should spendless brain-cells on packaging. That way, a contributor gets a better"mileage" than they do today. That is why I am a bit hesitant aboutdoing "in build policy checker". Though, feel free to present concretecases and I will consider it.


Best regards,
Niels

Reply to:

References:
- Re: Any volunteers for lintian co-maintenance?
  - From: Andreas Tille <tille@debian.org>
- Re: Any volunteers for lintian co-maintenance?
  - From: Andreas Tille <andreas@an3as.eu>

Prev by Date: Re: Salsa - best thing in Debian in recent years? (Re: finally end single-person maintainership)
Next by Date: Re: Salsa - best thing in Debian in recent years? (Re: finally end single-person maintainership)
Previous by thread: Re: Any volunteers for lintian co-maintenance?
Next by thread: Bug#1070827: ITP: libdata-fake-perl -- module for generating fake structured data for testing
Index(es):
- Date
- Thread