[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1101196: installation-reports: Trixie Installer fails to create first user when full name contains non-ASCII characters and spaces



On Sun, May 04, 2025 at 09:15:54PM +0200, Pascal Hambourg wrote:
On 04/05/2025 at 15:27, Marc Haber wrote:
So this allows the safe special characters below 0x40, a regular space, the latin letters in both cases, digits, the high order characters that are different in any ISO-8859 charset (explicitly excluding the non- breaking space and soft hyphen), followed by the Unicode Letters, Unicode Digits and Unicode Whitespace.

My test results with àœæßéÀÔùñ:

* with libperl5.40 and perl & perl-modules-5.40
* with LANG=fr_FR.UTF-8 or C.UTF-8
 \p{L}\p{Nd}\p{Zs}: OK
 \x{a1}-\x{ac}\x{ae}-\x{ff}: OK except œŒ
* with LANG=C
 \p{L}\p{Nd}\p{Zs}: non-ASCII KO
 \x{a1}-\x{ac}\x{ae}-\x{ff}: non-ASCII KO

It looks like that without libperl5.40, regexp matching is weird. And actually, with regexp matching, it is weird as well.


My results with the current version of adduser (just pushed to master):

# grep -v '^#' /etc/locale.gen


de_DE.UTF-8 UTF-8
en_US.UTF-8 UTF-8
fr_FR.UTF-8 UTF-8
# dpkg --list libperl5.40
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name              Version      Architecture Description
+++-=================-============-============-================================
=
ii  libperl5.40:amd64 5.40.1-3     amd64        shared Perl library

# export LANG=C.UTF-8
# adduser --comment="é ä O'Leary œŒ àœæßéÀÔùñ" --disabled-password --home /nonexistent testuser
chfn: name with non-ASCII characters: 'é ä O'Leary œŒ àœæßéÀÔùñ'
# getent passwd testuser && deluser testuser
testuser:x:1001:1001:é ä O'Leary œŒ àœæßéÀÔùñ,,,:/nonexistent:/bin/bash
warn: `/usr/bin/crontab' not executed. Skipping crontab removal. Package `cron' required.

# export LANG=fr_FR.UTF-8
# adduser --comment="é ä O'Leary œŒ àœæßéÀÔùñ" --disabled-password --home /nonexistent testuser
chfn : nom avec des caractères non-ASCII : 'é ä O'Leary œŒ àœæßéÀÔùñ'
# getent passwd testuser && deluser testuser
testuser:x:1001:1001:é ä O'Leary œŒ àœæßéÀÔùñ,,,:/nonexistent:/bin/bash
warn: `/usr/bin/crontab' not executed. Skipping crontab removal. Package `cron' required.

# export LANG=C
# adduser --comment="\303\251 \303\244 O'Leary \305\223\305\222 \303\240\305\2
23\303\246\303\237\303\251\303\200\303\2
24\303\271\303\261" --disabled-password --home /nonexistent
testuser
invalid characters in input string, see trace output for more details at /usr/share/perl5/Debian/AdduserCommon.pm line 143.
# getent passwd testuser && deluser testuser
#
The failure on LANG=C is expected and I believe it is correct to reject UTF-8 comments in a non-UTF-8-Setting.

Want me to test with more strange characters?

Note: with LANG=C and either the original or new regexes, adduser indefinitely hangs with high CPU load if the gecos field contains more than 5 non-ASCII characters. It does not happen without libperl5.40. This currently affects the installer.

That sounds bad. I was able to strip down adduser to reproduce the behavior. This is easily in the top 5 perl bugs I encountered yet. I filed it as #1104726. In Adduser, there is a workaround in place now.


* without libperl5.40 and perl, with or without perl-modules-5.40
* LANG=fr_FR.UTF-8 or C.UTF-8 or C
 \p{L}\p{Nd}\p{Zs}: non-ASCII KO except à
 \x{a1}-\x{ac}\x{ae}-\x{ff}: àœæÆß and uppercase accented letters KO

Sadly I have not made any progress here. It looks like that testing on the command line with a non UTF-8 locale does not make much sense since when I type É in my shell, I get UTF-8 0xC3 0x89 which my regexp is not able to match when running under a non-UTF-8 locale.

In the mean time, I have written a bit more test cases (testing adduser 2without libperl5.40 is hard since the autopkgtest test suite needs libperl5.40 itself) and have changed the regexp, so that the weirdness suggested now at least works with libperl5.40 installed.

I need the help of a true perl monk for the "without libperl5.40" case and will open a respective bug against adduser probably later today. I'd like the "improved" adduser to be in unstable first.

So, on a system without full perl (and probably with a non UTF-8- locale), this will match most languages that have an ISO-8859 charset. In a full system, we have full Unicode support.

d-i always installs C.UTF-8, so there is at least one UTF-8 locale.

And which locale is set when the Installer calls adduser?

Would this help the installer?

It looks like a step forward, but the new regex still does not match some letters nor uppercase accented letters when libperl is not installed.

I'll get that one uploaded later today so that at least Tom O'Malley could get his real name set. This should be right in time for the hard freeze. The real fix will come after the freeze, so I'd need to have that approved by the release team.

The old adduser didn't do any sanitation at this place, and of course it would fix this issue if I'd just match on .* for the comment field without making anything worse than it was on bookworm, but I'd really hate that solution.

Greetings
Marc

--
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Reply to: