[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Mandatory LC_ALL=C.UTF-8 during package building



Hi,

Quoting Hakan Bayındır (2024-06-06 12:32:27)
> On 6.06.2024 ÖS 1:08, Johannes Schauer Marin Rodrigues wrote:
> > Quoting Simon Richter (2024-06-06 11:32:33)
> >>> Would it be possible to set in stone that packages are supposed to always
> >>> be built in an environment where LC_ALL=C.UTF-8, or, in other words, that
> >>> builders must set LC_ALL=C.UTF-8?
> >>
> >> This would be the opposite of the current rule.
> >>
> >> Setting LC_ALL=C in debian/rules is an one-liner.
> >>
> >> If your package is not reproducible without it, then your package is
> >> broken. It can go in with the workaround, but the underlying problem
> >> should be fixed at some point.
> >>
> >> The reproducible builds checker explicitly tests different locales to
> >> ensure reproducibility. Adding this requirement would require disabling this
> >> check, and thus hide an entire class of bugs from detection.
> > 
> > this is one facet of a much bigger discussion (which we've had before). You can
> > argue both ways, depending on how you look at this problem.
> > 
> > It is the question of whether we want to:
> > 
> >   a) debian/rules is supposed to be runnable in a wide variety of environments.
> >   If your package FTBFS in a one specific environment, it is the job of d/rules
> >   to normalize the environment to cater for the specific needs of the package.
> > 
> >   b) debian/rules is supposed to be run in a well-defined environment. If your
> >   package FTBFS in this normalized environment, then it is the job of d/rules to
> >   add the specific needs of the package to d/rules.
> > 
> > So the question is whether you either want to have d/rules normalize
> > heterogeneous environments (a) or whether you want d/rules to make a normalized
> > environment specific to the build (b). This is of course a spectrum and I think
> > we currently doing much more of (a).
> 
> I agree with Simon here.

And, if I understand your reply correctly, you do not disagree with me either?

> C, or C.UTF-8 is not a universal locale which > works for all.

Yes. If we imagine a hypothetical switch to LC_ALL=C.UTF-8 for all source
packages by default, then there will be bugs. The question is, which bugs do we
want to fix: Bugs that happen because of a problem that occurs because we did
*not* set LC_ALL=C.UTF-8 (like reproducible builds problems) or problems that
occur because we *did* set LC_ALL=C.UTF-8 as in the example that you are
describing below.

> While C.UTF-8 solves character representation part of 
> "The Turkish Test" [0], it doesn't solve capitalization and sorting  issues.
> 
> In short, Turkish is the reason why some English text has "İ" and "ı" in 
> it, because in Turkish, they're all present (ı, i, I,  İ), and their 
> capitalization rules are different (i becomes İ and ı  becomes I; i.e. 
> no loss/gain of dot during case changes).
> 
> This  creates tons of problems with software which are not aware of the 
> issue  (Kodi completely breaks for example, and some software needs 
> forced/custom environments to run).

As I'm curious: if your software breaks depending on the LC_ALL setting, how do
you make it produce reproducible binaries? If it breaks with a certain LC_ALL,
then during the build you have to set LC_ALL (or one of its friends) to some
specific value, right?

> So, all in all, if your software is expected to run in an international 
> environment, and its build/run behavior breaks in an environment is not 
> to its liking, I also argue that the software is broken to begin with. 
> Because when this problem takes hold in a codebase, it is nigh 
> impossible to fix.
> 
> So, I think it's better to strive to evolve the software to be a better 
> international citizen rather than give all the software we build an 
> artificially sterile environment, which is iteratively harder and harder 
> to build and maintain.

Just to make sure I'm not misunderstood: I also am tending towards *not*
setting LC_ALL=C.UTF-8 (but probably not as strongly as I understood Simon's
mail) just because I like dumping my time into figuring out why my software
does something different in a very specific environment. Figuring this out
does uncover bugs that should be fixed most of the time.

At the same time though, I also get annoyed of copy-pasting d/rules snippets
from one of my packages to the next instead of making use of a few more
defaults in our package build environment.

Thanks!

cheers, josch

Attachment: signature.asc
Description: signature


Reply to: