[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#973313: lintian: Fix failing Lintian jobs for many if not all packages hosted on Salsa



Package: lintian
Severity: important

Hi,

Since October 22, the Lintian jobs on Salsa pipelines have failed for
many packages, including Lintian. [1] In the container, 'groff' cannot
find LC_ALL defined even though we reset the environment and
explicitly provide LC_ALL=C.UTF-8. [2] It is ineffective, and that is
the substance of this bug.

I am not sure what changed (my commit could not have done it). I only
see a recent upload for bash, but not for groff, perl or
libipc-run3-perl. I already asked on #salsci. I was told that the
runner images had not been manipulated in some time. The cause would
likely be elsewhere.

I also asked on #salsa because, on October 22, the runner base system
was upgraded to Debian 10 from 9. (That is unrelated to the images
provided by Salsa CI.) According to the Salsa admins that change
should have had no impact. For the time being we are at a loss.

Colin Watson provides a workaround below, but we will try to find the
real bug first:

    "If all else fails then setting MAN_NO_LOCALE_WARNING=1 may be a
    viable workaround."

This bug filing follows discussions on debian-devel@lists.d.o and IRC,
the relevant parts of which were copied below.

There is also a Salsa issue about this [3], but it's probably better
to centralize the discussion here. The bug may indeed belong to Salsa
CI but issues filed on that website seem less permanent than a bug in
the BTS. Please use this bug to comment on the issue going forward.
Thank you!

Kind regards
Felix Lechner

[1] https://lintian.pages.debian.net/-/lintian/-/jobs/1098261/artifacts/debian/output/lintian.html
[2] https://salsa.debian.org/lintian/lintian/-/blob/master/checks/documentation/manual.pm#L279-281
[3] https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/182

***

Hello there!

My Salsa CI pipeline is blowing up in the lintian step, with
lots of warnings of the form:

"W: notcurses-bin: groff-message
usr/share/man/man1/notcurses-demo.1.gz can't set the locale; make sure
$LC_* and $LANG are correct"

This is printed for each man page I package. An example run is
here:

https://salsa.debian.org/debian/notcurses/-/jobs/1107065

The only reference I could find was
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=606933, which
didn't seem relevant.

Is this due to having supra-ascii UTF8 characters in my man
pages? Is there anything I can do to work around this? I tried
exporting LANG to a UTF-8 locale in my salsa variables section,
but that didn't help.

I'm using pandoc to generate my man pages, and it happily
accepts UTF-8, but I can see a case for restricting them to
ASCII.

Thanks!

--
nick black -=- https://www.nick-black.com

* * *

Hi Nick,

On Sun, Oct 25, 2020 at 6:23 PM Nick Black <dankamongmen@gmail.com> wrote:
>
> Is this due to having supra-ascii UTF8 characters in my man
> pages?

It's not a problem with your package. Lintian's own pipeline is
likewise affected, even though our test suite completes fine in an
unstable chroot. The issue is being tracked here:
https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/182

Kind regards
Felix Lechner

* * *

Felix Lechner left as an exercise for the reader:
> It's not a problem with your package. Lintian's own pipeline is
> likewise affected, even though our test suite completes fine in an
> unstable chroot. The issue is being tracked here:
> https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/182

Thanks for the quick response, Felix. You say that "[you] will
probably start setting $LANG in that part of Lintian." what LANG
will you be using? Attempting to set LANG=en_US.UTF-8 in my
salsa ci variables resulted in setlocale(3) failing all over the
place, presumably due to the locale not having been generated.

--
nick black -=- https://www.nick-black.com

* * *

On Mon, 26 Oct 2020 at 00:35:45 -0400, Nick Black wrote:
> Thanks for the quick response, Felix. You say that "[you] will
> probably start setting $LANG in that part of Lintian." what LANG
> will you be using? Attempting to set LANG=en_US.UTF-8 in my
> salsa ci variables resulted in setlocale(3) failing all over the
> place, presumably due to the locale not having been generated.

C.UTF-8 is available on all Debian systems. It's the standard C/POSIX
locale, except that in the C locale the meaning of bytes 0x80-0xFF is
undefined, while in C.UTF-8 they are assumed/defined to be part of a
character encoded in UTF-8.

If you care about portability to non-Debian systems, note that C.UTF-8 is
a somewhat popular extension (I think it originated in the Fedora/Red Hat
family before it was adopted by Debian and other distros) but is far from
universally available. In particular, I'm aware of Arch Linux specifically
*not* having it. The glibc maintainers consider the implementation used
in e.g. Fedora and Debian to be a hack rather than something they want to
maintain forever, but my understanding is that they would be willing to
accept a better implementation.

en_US.UTF-8 is indeed not portable. Some OSs (Fedora, I think?) always
generate the en_US.UTF-8 locale regardless of any other configuration
that might exist, but Debian does not: if you chose a non-English locale
like fr_FR.UTF-8 or a non-American English locale like en_GB.UTF-8 during
installation, then you will normally only have three locales, your chosen
national locale plus the international locales C and C.UTF-8.

Minimal container/chroot environments, and in particular the official
Debian buildds, will normally only have C and C.UTF-8. See src:gtk+4.0
for an example of how to generate additional locales on-demand if your
unit tests need them.

Third-party software from outside Debian frequently assumes that the
en_US.UTF-8 locale does exist - in particular, it's common enough for
Steam games to want it to exist that Steam's diagnostic tool now checks
for it. This is mostly because it's semi-frequently (ab)used as a way
to parse and serialize C-syntax floating point in programming languages
or configuration files without getting confused by non-English decimal
points (e.g. 1.23 in English locales is 1,23 in French locales, which
means a naive implementation might write {"x": 1,23, "y": 4,56} into a
JSON file, which is of course a syntax error).

The portable way to read/write configuration files and C-like source
code is to avoid the POSIX locale-sensitive functions completely,
and use something like GLib's g_ascii_strtod() or CPython's
PyOS_string_to_double() (lots of libraries and frameworks will have an
equivalent, those are just the ones I'm most familiar with). This also
has the advantage of being thread-safe, unlike temporarily switching
POSIX locales, which is normally process-wide and therefore not thread-safe.

Another correct way to do this since POSIX.1-2008 is to use POSIX
uselocale() and the C locale, but that's unlikely to be portable
to Windows or to exotic Unix implementations, so widely-portable
software generally ends up having to reinvent something equivalent to
g_ascii_strtod() anyway.

    smcv

* * *

Simon McVittie left as an exercise for the reader:
> If you care about portability to non-Debian systems, note that C.UTF-8 is
> a somewhat popular extension (I think it originated in the Fedora/Red Hat
> family before it was adopted by Debian and other distros) but is far from
> universally available. In particular, I'm aware of Arch Linux specifically
> *not* having it. The glibc maintainers consider the implementation used
> in e.g. Fedora and Debian to be a hack rather than something they want to
> maintain forever, but my understanding is that they would be willing to
> accept a better implementation.

As I "need" this only within the Debian Salsa CI (and only to
deal with this groff lintian warning, which it sounds like will
be handled another way), a Debian-specific solution would be
fine =]. Thanks for the details -- C.UTF-8 sounds like the right
way to go.

--
nick black -=- https://www.nick-black.com

* * *

On Mon, 26 Oct 2020 11:47:37 +0000, Simon McVittie wrote:

> Minimal container/chroot environments, and in particular the official
> Debian buildds, will normally only have C and C.UTF-8. See src:gtk+4.0
> for an example of how to generate additional locales on-demand if your
> unit tests need them.

Alternatively, build-depending on locales-all usually also works
(benefit: no manual meddling with locales, cost: installation size).

Cheers,
gregor

* * *

Hi Nick,

On Mon, Oct 26, 2020 at 5:11 AM Nick Black <dankamongmen@gmail.com> wrote:
>
> C.UTF-8 sounds like the right way to go.

As noted in the issue tracker [1], Lintian already sets LC_ALL to
C.UTF-8 [2] in a sanitized environment, but we do not currently set
LANG. That would have been my next step, except these issues do not
occur in a clean chroot for unstable and are therefore more likely
related to Salsa or Salsa CI.

Kind regards
Felix Lechner

[1] https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/182
[2] https://salsa.debian.org/lintian/lintian/-/blob/master/checks/documentation/manual.pm#L281

* * *

On Mon, Oct 26, 2020 at 07:57:58AM -0700, Felix Lechner wrote:
> On Mon, Oct 26, 2020 at 5:11 AM Nick Black <dankamongmen@gmail.com> wrote:
> > C.UTF-8 sounds like the right way to go.
>
> As noted in the issue tracker [1], Lintian already sets LC_ALL to
> C.UTF-8 [2] in a sanitized environment, but we do not currently set
> LANG.

LC_ALL should imply LANG, and as far as I know that works fine in man
(which is the program producing the warning message in this case), so
this should make no difference.  If somebody can come up with a reduced
test environment in which man does not seem to interpret LC_ALL as
implying LANG, I'd consider that a bug.

--
Colin Watson (he/him)                              [cjwatson@debian.org]

* * *

On Mon, 26 Oct 2020 at 18:35:53 +0000, Colin Watson wrote:
> LC_ALL should imply LANG

One thing that it does not imply is LANGUAGE, used for LC_MESSAGES as a
GNU extension (at a higher precedence than even LC_ALL).

    smcv

* * *

On Mon, Oct 26, 2020 at 08:16:43PM +0000, Simon McVittie wrote:
> On Mon, 26 Oct 2020 at 18:35:53 +0000, Colin Watson wrote:
> > LC_ALL should imply LANG
>
> One thing that it does not imply is LANGUAGE, used for LC_MESSAGES as a
> GNU extension (at a higher precedence than even LC_ALL).

Indeed, though I don't believe it's possible for it to cause the warning
message in question here (which results from setlocale (LC_ALL, "")
returning NULL).

If all else fails then setting MAN_NO_LOCALE_WARNING=1 may be a viable
workaround.


--
Colin Watson (he/him)                              [cjwatson@debian.org]


Reply to: