Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly

To: Michael Tokarev <mjt@tls.msk.ru>, 600310@bugs.debian.org
Subject: Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly
From: Bastian Blank <waldi@debian.org>
Date: Sat, 16 Oct 2010 12:05:01 +0200
Message-id: <[🔎] 20101016100501.GA25594@wavehammer.waldi.eu.org>
Reply-to: Bastian Blank <waldi@debian.org>, 600310@bugs.debian.org
In-reply-to: <[🔎] 20101015214742.17728.79430.reportbug@gandalf.local>
References: <[🔎] 20101015214742.17728.79430.reportbug@gandalf.local>

On Sat, Oct 16, 2010 at 01:47:42AM +0400, Michael Tokarev wrote:
> There's a bug in et_EE.UTF-8 locale definition causing some latin
> chars to be treated as non-letters.  These are at least in range
> t..y inclusive, i.e. [t-y].  Like this:

Are you sure that the letters t to y a valid in the alphabet used by the
estonian language?

Wikipedia shows the following alphabet:
a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, š, z, ž, t, u,
v, w, õ, ä, ö, ü, x, y

So a-z matches only half the alphabet. t to y are _after_ z.

> I.e., the latin letter "t" does not match [a-z] regexp.

As expected. Use the character classes if you want all small characters.

> This is a critical issue unfortunately, because it makes various regex
> failing to match, breaking random components.  In the actual problem case
> the issue were that many cron jobs were not running on the system for a
> mysterious reason, and the problem was because cron uses a regexp to
> filter invalid cronjob names, which is /^[a-zA-Z0-9_-]+$/.

If you want ascii rules, force LC_ALL=C or at least LC_COLLATE=C. So
cron is the culprit.

Bastian

-- 
Landru! Guide us!
		-- A Beta 3-oid, "The Return of the Archons", stardate 3157.4

Reply to:

References:
- Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly
  - From: Michael Tokarev <mjt@tls.msk.ru>

Prev by Date: Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly
Next by Date: Bug#600310: reassigning to cron
Previous by thread: Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly
Next by thread: Bug#600310: reassigning to cron
Index(es):
- Date
- Thread