[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1101196: installation-reports: Trixie Installer fails to create first user when full name contains non-ASCII characters and spaces



On Sun, May 04, 2025 at 09:15:54PM +0200, Pascal Hambourg wrote:
On 04/05/2025 at 15:27, Marc Haber wrote:
It looks like the \p{L} and other Unicode character classes dont match anything if libperl is not installed.

According to my tests, they match at least ASCII letters, digits, regular ASCII space and non-breakable space.

So we just extend the regexp to match explictly what would be in ISO-8859-x, yielding the kind of uncomfortable

commentre => qr/[-"_\.+!\$%&()\]\[;\/'’ A-Za-z0-9\x{a1}-\x{ac}\x{ae}- \x{ff}\p{L}\p{Nd}\p{Zs}]*/,

So this allows the safe special characters below 0x40, a regular space, the latin letters in both cases, digits, the high order characters that are different in any ISO-8859 charset (explicitly excluding the non- breaking space and soft hyphen), followed by the Unicode Letters, Unicode Digits and Unicode Whitespace.

My test results with àœæßéÀÔùñ:

* with libperl5.40 and perl & perl-modules-5.40
* with LANG=fr_FR.UTF-8 or C.UTF-8
 \p{L}\p{Nd}\p{Zs}: OK
 \x{a1}-\x{ac}\x{ae}-\x{ff}: OK except œŒ

I will have to take alook at the œŒ issue.

* with LANG=C
 \p{L}\p{Nd}\p{Zs}: non-ASCII KO
 \x{a1}-\x{ac}\x{ae}-\x{ff}: non-ASCII KO

That's as specificed, IMO. I am not sure there is anything I can do here short of accepting everything.

Note: with LANG=C and either the original or new regexes, adduser indefinitely hangs with high CPU load if the gecos field contains more than 5 non-ASCII characters. It does not happen without libperl5.40. This currently affects the installer.

Ouch!

* without libperl5.40 and perl, with or without perl-modules-5.40
* LANG=fr_FR.UTF-8 or C.UTF-8 or C
 \p{L}\p{Nd}\p{Zs}: non-ASCII KO except à
 \x{a1}-\x{ac}\x{ae}-\x{ff}: àœæÆß and uppercase accented letters KO

I don't understnad that.

So, on a system without full perl (and probably with a non UTF-8- locale), this will match most languages that have an ISO-8859 charset. In a full system, we have full Unicode support.

d-i always installs C.UTF-8, so there is at least one UTF-8 locale.

Does it also USE C.UTF-8? Anyway, without libperl5.40 it looks like we only have rudimentary UTF-8 support in perl.

Would this help the installer?

It looks like a step forward, but the new regex still does not match some letters nor uppercase accented letters when libperl is not installed.

That's bad.

Greetings
Marc

--
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Reply to: