Bug#1024974: [libc6] Schroedinger's fnmatch() in an UTF-8 locale
Package: libc6
Version: 2.36-5
Severity: normal
Tags: upstream
X-Debbugs-Cc: roam@debian.org
Hi,
Thanks for taking care of glibc in Debian!
While trying to write a test case for a text processing utility that is
sort of aware of locales and character encodings, I stumbled upon
the fact that, in an UTF-8-capable locale, fnmatch() seems to think
that the `ñ` ("enye", "LATIN SMALL LETTER N WITH TILDE", U+00F1)
character should match both the "?" and "??" patterns. See the attached
C program and the `run-test.sh` demonstration tool; `make test` in
a directory where all four files are installed should do it.
If anything goes wrong with the attached files, they are also available
in a GitLab repository at https://gitlab.com/ppentchev/fnmess
A bullseye chroot and Docker container do not show the problem
(the test passes).
FTR, I was able to reproduce the problem on an AlmaLinux 9 system with
glibc 2.34, so it might not be limited to 2.36.
Thanks in advance for your time, and keep up the great work!
G'luck,
Peter
-- System Information:
Debian Release: bookworm/sid
APT prefers testing
APT policy: (990, 'testing'), (500, 'stable-updates'), (500, 'stable-security'), (500, 'oldstable-updates'), (500, 'oldoldstable'), (500, 'stable'), (500, 'oldstable')
Architecture: amd64 (x86_64)
Kernel: Linux 6.0.0-4-amd64 (SMP w/8 CPU threads; PREEMPT)
Locale: LANG=bg_BG.UTF-8, LC_CTYPE=bg_BG.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages libc6 depends on:
ii libgcc-s1 12.2.0-9
Versions of packages libc6 recommends:
ii libidn2-0 2.3.3-1+b1
Versions of packages libc6 suggests:
ii debconf [debconf-2.0] 1.5.79
pn glibc-doc <none>
ii libc-l10n 2.36-5
pn libnss-nis <none>
pn libnss-nisplus <none>
ii locales 2.36-5
-- debconf information:
* libraries/restart-without-asking: true
glibc/disable-screensaver:
glibc/kernel-not-supported:
glibc/kernel-too-old:
glibc/restart-failed:
glibc/restart-services:
glibc/upgrade: true
#!/usr/bin/make -f
#
# Copyright (c) 2022 Peter Pentchev <roam@ringlet.net>
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
CPPFLAGS?= -D_POSIX_C_SOURCE=200809L -D_XOPEN_SOURCE=700
CFLAGS_WARN?= -Wall -W -Wextra -Wno-trigraphs
CFLAGS_OPT?= -g -O -pipe
CFLAGS?= ${CFLAGS_WARN} ${CFLAGS_OPT}
LDFLAGS?=
LIBS?=
all: fnmess
fnmess: fnmess.o
cc ${LDFLAGS} -o fnmess fnmess.o ${LIBS}
fnmess.o: fnmess.c
cc -c ${CPPFLAGS} ${CFLAGS} -o fnmess.o fnmess.c
clean:
rm -f fnmess fnmess.o
test: all
sh run-test.sh python3 fnmess.py
sh run-test.sh ./fnmess
.PHONY: clean all test
/**
* Copyright (c) 2022 Peter Pentchev <roam@ringlet.net>
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include <fnmatch.h>
#include <locale.h>
#include <stdio.h>
int main(void)
{
char enye[3] = {0xC3, 0xB1, 0};
puts("Hell world!");
setlocale(LC_ALL, "");
printf("Using the '%s' locale for LC_CTYPE\n", setlocale(LC_CTYPE, NULL));
printf("Does it match '?': %s\n", fnmatch("?", enye, 0) == 0 ? "yes" : "no");
printf("Does it match '??': %s\n", fnmatch("??", enye, 0) == 0 ? "yes" : "no");
printf("Does it match '???': %s\n", fnmatch("???", enye, 0) == 0 ? "yes" : "no");
return 0;
}
#!/bin/sh
#
# Copyright (c) 2022 Peter Pentchev <roam@ringlet.net>
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
set -e
check()
{
local tempf="$1" loc="$2" c1="$3" c2="$4" c3="$5"
shift 5
printf -- '\n==== Checking the result for the %s locale\n\n' "$loc"
env LC_CTYPE="$loc" "$@" > "$tempf"
# Yes, there are dozens of ways to make this more generic. I know.
if ! grep -Fxe "Does it match '?': $c1" -- "$tempf"; then
echo 'Failed the "?" check' 1>&2
exit 1
fi
if ! grep -Fxe "Does it match '??': $c2" -- "$tempf"; then
echo 'Failed the "??" check' 1>&2
exit 1
fi
if ! grep -Fxe "Does it match '???': $c3" -- "$tempf"; then
echo 'Failed the "???" check' 1>&2
exit 1
fi
}
if [ "$#" -eq 0 ]; then
echo 'Usage: run-test.sh command [args...]' 1>&2
echo '' 1>&2
echo 'Examples: run-test.sh ./fnmess' 1>&2
echo ' run-test.sh python3 fnmess.py' 1>&2
echo '' 1>&2
exit 1
fi
if [ -z "$FNMESS_TEST_U8LOC" ]; then
echo 'Looking for an UTF-8-capable locale'
u8loc="$(locale -a | grep -Eie '\.utf-?8([^a-zA-Z0-9_-]|$)' | head -n1)"
if [ -z "$u8loc" ]; then
echo "No UTF-8-capable locale found" 1>&2
exit 1
fi
else
u8loc="$FNMESS_TEST_U8LOC"
fi
echo "Using '$u8loc' as a multibyte locale"
if [ -z "$FNMESS_TEST_SINGLOC" ]; then
echo 'Looking for an ISO-8859-1 or ISO-8859-15 locale'
singloc="$(locale -a | grep -Eie '\.iso-?8859-?(1|15)([^a-zA-Z0-9_-]|$)' | head -n1)"
if [ -z "$singloc" ]; then
echo "No ISO-8859-1 or ISO-8859-15 locale found" 1>&2
exit 1
fi
else
singloc="$FNMESS_TEST_SINGLOC"
fi
echo "Using '$singloc' as a single-byte locale"
tempf="$(mktemp)"
trap "rm -f -- '$tempf'" EXIT INT HUP QUIT TERM
echo "Using '$tempf' as a temporary file"
printf -- '\n==== Running in the %s locale, expected: no, yes, no\n\n' "$singloc"
env LC_CTYPE="$singloc" "$@"
check "$tempf" "$singloc" 'no' 'yes' 'no' "$@"
printf -- '\n==== Running in the %s locale, expected: yes, no, no\n' "$u8loc"
env LC_CTYPE="$u8loc" "$@"
check "$tempf" "$u8loc" 'yes' 'no' 'no' "$@"
printf -- '\n==== Seems fine!\n\n'
#!/usr/bin/python3
#
# Copyright (c) 2022 Peter Pentchev <roam@ringlet.net>
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
"""Check whether Python's fnmatch is bug-for-bug compatible with libc."""
import fnmatch
import locale
def check(value: str, pattern: str) -> None:
"""Check whether the value matches the pattern."""
res = "yes" if fnmatch.fnmatch(value, pattern) else "no"
print(f"Does it match '{pattern}': {res}")
def main() -> None:
"""Does the Python fnmatch() function also have that bug?"""
encoding = locale.nl_langinfo(locale.CODESET)
print(f"Using {encoding} as the LC_CTYPE character encoding")
bstr = b"\xC3\xB1"
cstr = bstr.decode(encoding)
print(f"The character string now has a length of {len(cstr)}")
check(cstr, "?")
check(cstr, "??")
check(cstr, "???")
if __name__ == "__main__":
main()
Reply to: