Bug#1101196: installation-reports: Trixie Installer fails to create first user when full name contains non-ASCII characters and spaces
On Sun, May 04, 2025 at 09:15:54PM +0200, Pascal Hambourg wrote:
On 04/05/2025 at 15:27, Marc Haber wrote:
It looks like the \p{L} and other Unicode character classes dont
match anything if libperl is not installed.
According to my tests, they match at least ASCII letters, digits,
regular ASCII space and non-breakable space.
So we just extend the regexp to match explictly what would be in
ISO-8859-x, yielding the kind of uncomfortable
commentre => qr/[-"_\.+!\$%&()\]\[;\/'’
A-Za-z0-9\x{a1}-\x{ac}\x{ae}- \x{ff}\p{L}\p{Nd}\p{Zs}]*/,
So this allows the safe special characters below 0x40, a regular
space, the latin letters in both cases, digits, the high order
characters that are different in any ISO-8859 charset (explicitly
excluding the non- breaking space and soft hyphen), followed by the
Unicode Letters, Unicode Digits and Unicode Whitespace.
My test results with àœæßéÀÔùñ:
* with libperl5.40 and perl & perl-modules-5.40
* with LANG=fr_FR.UTF-8 or C.UTF-8
\p{L}\p{Nd}\p{Zs}: OK
\x{a1}-\x{ac}\x{ae}-\x{ff}: OK except œŒ
I will have to take alook at the œŒ issue.
* with LANG=C
\p{L}\p{Nd}\p{Zs}: non-ASCII KO
\x{a1}-\x{ac}\x{ae}-\x{ff}: non-ASCII KO
That's as specificed, IMO. I am not sure there is anything I can do here
short of accepting everything.
Note: with LANG=C and either the original or new regexes, adduser
indefinitely hangs with high CPU load if the gecos field contains more
than 5 non-ASCII characters. It does not happen without libperl5.40.
This currently affects the installer.
Ouch!
* without libperl5.40 and perl, with or without perl-modules-5.40
* LANG=fr_FR.UTF-8 or C.UTF-8 or C
\p{L}\p{Nd}\p{Zs}: non-ASCII KO except à
\x{a1}-\x{ac}\x{ae}-\x{ff}: àœæÆß and uppercase accented letters KO
I don't understnad that.
So, on a system without full perl (and probably with a non UTF-8-
locale), this will match most languages that have an ISO-8859
charset. In a full system, we have full Unicode support.
d-i always installs C.UTF-8, so there is at least one UTF-8 locale.
Does it also USE C.UTF-8? Anyway, without libperl5.40 it looks like we
only have rudimentary UTF-8 support in perl.
Would this help the installer?
It looks like a step forward, but the new regex still does not match
some letters nor uppercase accented letters when libperl is not
installed.
That's bad.
Greetings
Marc
--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Reply to: