[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Any volunteers for lintian co-maintenance?



Andreas Tille:
Hi Niels,

at first sorry for my late answer.

At Thu, May 09, 2024 Niels Thykier wrote:
[...] >> For me, lintian fails in all roles it has. It is not a good tool for
newbies
to get help, since it can only test build artifacts. As an example, your
feedback look is a full package build followed by unpacking the package just
so lintian can tell you have a typo on line 4. That is a massive waste of
resources - notably developer time and mental bandwidth.

I understand your point about having a tool that checks the debian/ dir for
issues like spelling errors, binary files in the upstream source, and other
concerns right within the packaging tree before the build starts. However, I
don't understand why you mention newbies in this context.


My core argument is the feedback cycle is excruciatingly complicated and slow compared to what it needs to be for validation of "debian/*" files. In my view, the problem is amplified for newcomers in multiple areas.

[...]
As a consequence,
people now get auto-rejects when uploading because lintian on the FTP master
server does not produce the same output as current lintian in stable or
newer.

I think its a bit unfair to blame lintian about the fact that its old
versions do not do a proper job when it comes to checking newer packages.


Is it now? When I maintained lintian, I was of the understanding that the dak usage was an explicit use-case we, as lintian maintainers, were expected to support. In my time, I would have considered this situation as an RC bug against lintian if I had this change and the FTP masters were unable or unwilling to install the -backports version of lintian.

On the other side of the "unfairness" coin, I feel it is unfair to have people spend volunteer time being stuck in a painful cycle of "It works on my machine, but dak rejects it because lintian is not updated on the FTP masters machine" for which they are expected to ignore lintian warnings locally to get out of (you need overrides in the old format, which the new lintian then complains about - damned if you, damned if you don't). Those are volunteers that wasted their Debian time being cauhgt between lintian and dak and, in my book, that was much more unfair than having lintian (or dak) and its maintainers own up to it.

I feel we, as a distribution, should ensure such problems do not happen. As stated, in my time as a lintian maintainer, I felt the responsibility was with lintian and that is why I blame lintian.

Maybe times have changed here and we, as a distribution, no longer hold lintian accountable here. Not sure who is then, but maybe that is part of why this problem has existed for so long.

(For the record, I think the ship sailed on this one. I am not expecting Alex to go retroactively fix this problem on the lintian side. I expect us not to repeat this mistake again)

[...]
Especially for the editor support
related parts, where people get instant feedback both on issues and the fix,
automatic reformatting on save and completion suggestions. None of which
lintian or wrap-and-sort are capable of.

If you ask me personally I'm absolutely happy about a policy checker that
simply reports issues.  I'm fine with firing up an editor in some other
terminal and be done.  Maybe I'm missing your point but for me that's a
non-issue.  Or is your comparison with wrap-and-sort rather targeting at
some tool that automatically fixes the issues it has found and I can check
the changes afterwards with `git diff`?  Or something like the janitor tools
that even commit changes?

I feel my point is not coming across at all and that is frustrating me a bit.

Imagine you need to change `debian/control` for some reason regardless of the situation that triggered this. You open up your editor and do the change. In the process, you make a mistake.

The current workflow is:

 1) Edit file (introducing mistake)
 2) No feedback in the editor, so:
    a) You save the file
    b) Build an artifact that lintian can check
    c) Run lintian to get the feedback
 3) You correct the mistake.
 4) Rinse and repeat all the sub-steps of 2) to validate there are no
    mistakes.

This is the workflow you have today with lintian. And it applies equally to all kinds of mistakes from policy violations, to textual or semantic typos.

Now, I would like you to step away from the status quo. What this workflow *should* have been in my view is:

 1) Edit file (introducing mistake)
 2) Editor shows a "Here is a mistake"-marker.
 3) You correct the mistake (either manually or via a quick fix)
 4) Editor removes a "Here is a mistake"-marker.
 5) Save the file

Notice here that I do not need to leave my editor to get feedback. I get it automatically, so I cannot forget it nor am I inclined to skip the check in a hurry. This is the crux of my problem with status-quo feedback loop. I have *actively* ask for feedback. I have to wait for it too which becomes paper cut. These are unnecessary a mental burden and paper cuts for a considerable part of problems you can introduce via editing `debian/*` files. IDEs have solved this problem very well via their near instant feedback loops. I feel we are long overdue for that.


Similarly, when you consider the reformatting flow of today, the flow is:

 1) Edit file
 2) Save file.
 3) Run `wrap-and-sort` to reset formatting.
    - Where I, by the way, have to manually pass the correct formatting
      options.

In the workflow I want, the cycle is:

 1) Edit file
 2) Save file, which causes the editor reformat automatically *).

Here; I do not have to remember to reformat the file. The editor does it for me. It is automatically correct rather than correction due to active manual labor on my part.


Obviously, the status quo workflow is possible. We have been doing it for years. However, we should not make a human do the work of a machine. Make the machine do what it does best; follow the same procedure every time. This enables us to free up mental bandwidth of our human volunteers for other things.


*) For packages that have opted in to automatic styling, since this is not a mandatory thing. Stating this explicitly to avoid the conversation detailing into a question of this being imposed.

[...]
But even if I am not successful with
`debputy`, I cannot imagine I would consider returning to lintian. It does
not scratch my itch and years of issues (some of which are still unfixed)
have made me not want to have anything to do with the tool.

[...]

Given your very interesting input we actually need people who are able to
dedicate quite some time on restructuring lintian in a way that respects the
fact that some checks can be done / are done by some other tool on source
level.  Alternatively lintian itself could be modularised to rather do what
you want.


Both in-editor feedback and the "debian files of an unpacked source tree" are the parts I am trying to cover with `debputy` (via `debputy lsp server` + `debputy lint/reformat` respectively)


I do not see lintian expanding to in-editor feedback. It is a massive undertaking in its own. Given no one have solved the "run lintian on an unpacked source tree" yet, which would be a prerequisite and also a considerable undertaking on top, I doubt we will ever see it. I also do not see any note worthy benefit of attempting direct code reuse from lintian at this step.


When you work on in-editor feedback, you will need at least:

 1) A lenient parser that keeps track of all sorts of things like
    syntax errors, white space, and comments that is usually the first
    thing your parser throws away to keep things simple. Ideally, it
    also:
    - supports reading a string or a line of lines, since the editor
      content are not always persisted to the file system. Instead, you
      get it from "somewhere else" (fed via socket in the LSP case)
    - continues after syntax errors, since otherwise you only get one
      error on syntax errors and most other feedback disappears, which
      can be annoying to the user. Especially important for completion
      since the half-finished typing might be syntactically be invalid.
      (Also, inserting a field in a deb822 stanza will temporarily split
       the stanza into two where at least one of them will definitely
       be invalid. You will want to be able to compute the completion
       as-if the stanzas are not split despite the file being
       "stanza, empty line / syntax error, stanza")

 2) Additionally, you need to know file ranges of everything. One thing
    is identifying that the `foriegn` value in the `Multi-Arch` field
    was a typo of `foreign`. But for editor support, you have to tell
    the editor where to put the marker. That range is different in all
    of the cases below:

      Multi-Arch: foriegn

      Multi-Arch:foriegn

      Multi-Arch:
      # Comment for the sake of the argument; probably breaks
       foriegn

    In all cases, the marker should be on the `foriegn` work because
    that is where the mistake. If you are lucky, you get the line number
    where `Multi-Arch:` appears and then you get retrace things
    manually. That gets even more complicated for non-string types or
    where parser "cleans" up things for you. As an example, with most
    deb822 parsers, it is hard to tell `Multi-Arch:foreign` apart from
    `Multi-Arch: foreign`, since the white space is to be trimmed in
    that particular case.

    Note ranges goes two ways. For diagnostics (linting), you tell the
    editor where the marker goes. For completion and hover docs, the
    editor tells you where the user is and you have to figure out what
    is at that point (file "debian/control, line 22, column 14"). This
    means you need a two-way mapping between content and position.
    Here, lintian only does one way mapping, and it only does basic
    positioning (like line or line + column). For code reuse, it would
    have to do full range of issues.

 3) You will need a lot of extra metadata that no one else will need.
    As an example, a simple linter might get away with knowing that
    "Multi-Arch" is a known field and has 4 allowed values. A complex
    one would know about 4 values with one of them being conditional
    on the Architecture field (which is less trivial to share in
    data-only format). If you do an on-line editor feature with:

    - hover docs, then you need the main documentation you want to show
      the user for the field and each of the values (depending on what
      the user requests docs for). Hover docs are partially static and
      partially dynamic data, which makes general purpose sharing of
      this data less trivial.

    - completion, then you may want to have a one-liner documentation
      for the values. Maybe some sorting hints to the editor, so it
      knows it should de-emphasis "allowed". Additionally, you want
      to track whether the values you offer are allowed in this context
      (which for Multi-Arch means checking the `Architecture` field,
      while for `Protected` it is static metadata that `no` is the
      default and the default would trigger a warning.)

    - In all of the above cases, you also want fields / data about
      things you cannot check. A linter does not need to know about
      all fields it cannot check (other than maybe for field name
      canonicalization purposes, a.k.a. "cute-field"). In the editor
      support, every known field is now also part of the completion
      "vocabulary" and hover docs may still be useful.

 4) Mentally to structure your work will be built around the user
    interacting with the editor. That is, you will be forced into an
    event driven architecture. Latency is visible to the user and will
    annoy them. A full second is a long wait at this point.

    Related, the user typing is sometimes multiple events because the
    user happened to type a bit too slow or maybe they stopped typing
    midway. So you want support for stopping long running diagnostics,
    so you do not build up a queue of pending but now irrelevant
    diagnostics.

    Lintian, for comparison, is entirely in a batch driven architecture,
    where latency of most steps was never important.

This is beyond the particular "idiosyncrasies" of how the LSP specification and tracking what the editor supports, when to provide what information to the editor, etc.

I can tell you with absolute certainty that lintian is ready for basically none of the above. It was not built for it and parts of this are an absolute pain to do. You do that because you have to do it to work with the editor support, not to support another project while you are already drowning in work trying to keep the project afloat.


Additionally, for a linter (hammer), every thing is a diagnostic (nail). For an editor integration, you have a more varied toolbox. As an example, `debputy` does not emit diagnostics for trailing white space like lintian does (with `--pedantic` as I recall). Instead, `debputy` fixes them automatically on saving where relevant. Because that is a better solution for the user when you are not forced to solve everything like a linter (hammer). Accordingly, even if it was possible to share all the lintian code, I would not want all of it meaning that lintian would now need conditions for "things `debputy` wants vs. things `debputy` does not". Again, not the thing you need trying to keep your project afloat.

[...] PS: In my view, the bleeding of lintian's quality started long before Axel
joined the lintian maintenance team and I do not fault Axel for being unable
to stop the bleeding. In my view, only a hero could have "managed" that at
the expense of their mental health.

Thanks a lot for your mental support to Axel which I confirm from my side.

To draw some conclusion out of the discussion:  We need to enhance the way
we are checking our packages for conformance with our policy.  You made
clear that quite a part can be done at source level.  I'm not fully sure
whether your main focus is on the time inside the build process or the
editing features you mentioned.

The `debputy` framework has two different "legs" here. One is the in-editor feedback with some batch counter parts for CI pipelines, which aims to be generally applicable to all packages.

The other leg is `debputy` self-checking the packaging instructions for packages built with `debputy`. In a sense, this also counts as policy checking but it is not a static analysis and therefore is not comparable to lintian.

 It is also not clear to me whether you are
questioning the general architecture like for instance the rule sets that
are in /usr/share/lintian/data.  IMHO this is a valuable set of rules that
can be used by alternative tools as well.  Do you agree with this or not?


I find that data to be of questionable value to my work at the present time or other tools in this area:

 1) I do not remember lintian every committing to these being part of
    its API. Indeed, I see some files that have changed format since
    my time there and they often also engineered to fit lintian specific
    needs rather than being general purpose data files.

 2) A large part of the files would not be relevant to my work since
    I am not looking at upstream code or packaged artifacts.

 3) In my work, I would need a lot extra auxiliary metadata that lintian
    will not need (per my remarks above on doing your own editor
    integration).

Obviously, there could be value in sharing rules, data and metadata of this kind with other interested projects. Jelmer and I already discussed this possibility in relation to `lintian-brush`. However, it is not something solved by simply declaring `/usr/share/lintian/data` as stable API. Instead, I would rather extract subsets of it into a general purpose data package as needed.

Ideally one where we can release the data faster than checkers, so we do not get the annoying effort that a new debian-policy upload triggers our static analysis tools being out of date for weeks or even months.

Side-bar: This debian-policy problem is one reason why `debputy` does not flag "newer-standards-version" as a problem (only older). I do not want to repeat this problem in `debputy`. It is a trade-off, because a typo could make the version too new by mistake and that would be silent in `debputy` at the moment. So I am definitely interested in outsourcing part of the data.

As I wrote in my other mail in this thread[1] I could imagine some policy
checker step after dh_clean.  When thinking twice about it another step
could be done before dh_builddeb which could detect lots of issues before
the package is built and can save the unpackaging step.  Are you targeting
at this as well?
 > Kind regards and thanks a lot for your inspiring input
     Andreas.

[1] https://lists.debian.org/debian-devel/2024/05/msg00162.html



No, I am not targeting this for `debhelper`. If you build a package with `debputy` instead of `debhelper`, there are some built-in self-checks of the provided packaging instructions compared to the "about to be produced"-package. It is conceptually similar to `dh_install` erroring out when you reference `usr/bin/foo` and `dh_install` cannot find said file.

It would not be difficult to add some form of policy checking layer on top of this, though the question is what we want to check at this point where the helper should not just fix it instead. If the tool can fix it, then it is better than "here is a problem for you to read up on and then fix manually even though there was only one obvious solution". One thing requires brain-cells, the other does not.

My end goal with `debputy` is that the average contributor should spend less brain-cells on packaging. That way, a contributor gets a better "mileage" than they do today. That is why I am a bit hesitant about doing "in build policy checker". Though, feel free to present concrete cases and I will consider it.

Best regards,
Niels


Reply to: