Hi, Quoting Hakan Bayındır (2024-06-06 12:32:27) > On 6.06.2024 ÖS 1:08, Johannes Schauer Marin Rodrigues wrote: > > Quoting Simon Richter (2024-06-06 11:32:33) > >>> Would it be possible to set in stone that packages are supposed to always > >>> be built in an environment where LC_ALL=C.UTF-8, or, in other words, that > >>> builders must set LC_ALL=C.UTF-8? > >> > >> This would be the opposite of the current rule. > >> > >> Setting LC_ALL=C in debian/rules is an one-liner. > >> > >> If your package is not reproducible without it, then your package is > >> broken. It can go in with the workaround, but the underlying problem > >> should be fixed at some point. > >> > >> The reproducible builds checker explicitly tests different locales to > >> ensure reproducibility. Adding this requirement would require disabling this > >> check, and thus hide an entire class of bugs from detection. > > > > this is one facet of a much bigger discussion (which we've had before). You can > > argue both ways, depending on how you look at this problem. > > > > It is the question of whether we want to: > > > > a) debian/rules is supposed to be runnable in a wide variety of environments. > > If your package FTBFS in a one specific environment, it is the job of d/rules > > to normalize the environment to cater for the specific needs of the package. > > > > b) debian/rules is supposed to be run in a well-defined environment. If your > > package FTBFS in this normalized environment, then it is the job of d/rules to > > add the specific needs of the package to d/rules. > > > > So the question is whether you either want to have d/rules normalize > > heterogeneous environments (a) or whether you want d/rules to make a normalized > > environment specific to the build (b). This is of course a spectrum and I think > > we currently doing much more of (a). > > I agree with Simon here. And, if I understand your reply correctly, you do not disagree with me either? > C, or C.UTF-8 is not a universal locale which > works for all. Yes. If we imagine a hypothetical switch to LC_ALL=C.UTF-8 for all source packages by default, then there will be bugs. The question is, which bugs do we want to fix: Bugs that happen because of a problem that occurs because we did *not* set LC_ALL=C.UTF-8 (like reproducible builds problems) or problems that occur because we *did* set LC_ALL=C.UTF-8 as in the example that you are describing below. > While C.UTF-8 solves character representation part of > "The Turkish Test" [0], it doesn't solve capitalization and sorting issues. > > In short, Turkish is the reason why some English text has "İ" and "ı" in > it, because in Turkish, they're all present (ı, i, I, İ), and their > capitalization rules are different (i becomes İ and ı becomes I; i.e. > no loss/gain of dot during case changes). > > This creates tons of problems with software which are not aware of the > issue (Kodi completely breaks for example, and some software needs > forced/custom environments to run). As I'm curious: if your software breaks depending on the LC_ALL setting, how do you make it produce reproducible binaries? If it breaks with a certain LC_ALL, then during the build you have to set LC_ALL (or one of its friends) to some specific value, right? > So, all in all, if your software is expected to run in an international > environment, and its build/run behavior breaks in an environment is not > to its liking, I also argue that the software is broken to begin with. > Because when this problem takes hold in a codebase, it is nigh > impossible to fix. > > So, I think it's better to strive to evolve the software to be a better > international citizen rather than give all the software we build an > artificially sterile environment, which is iteratively harder and harder > to build and maintain. Just to make sure I'm not misunderstood: I also am tending towards *not* setting LC_ALL=C.UTF-8 (but probably not as strongly as I understood Simon's mail) just because I like dumping my time into figuring out why my software does something different in a very specific environment. Figuring this out does uncover bugs that should be fixed most of the time. At the same time though, I also get annoyed of copy-pasting d/rules snippets from one of my packages to the next instead of making use of a few more defaults in our package build environment. Thanks! cheers, josch
Attachment:
signature.asc
Description: signature