Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly
On Sat, Oct 16, 2010 at 01:47:42AM +0400, Michael Tokarev wrote:
> There's a bug in et_EE.UTF-8 locale definition causing some latin
> chars to be treated as non-letters. These are at least in range
> t..y inclusive, i.e. [t-y]. Like this:
Are you sure that the letters t to y a valid in the alphabet used by the
estonian language?
Wikipedia shows the following alphabet:
a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, š, z, ž, t, u,
v, w, õ, ä, ö, ü, x, y
So a-z matches only half the alphabet. t to y are _after_ z.
> I.e., the latin letter "t" does not match [a-z] regexp.
As expected. Use the character classes if you want all small characters.
> This is a critical issue unfortunately, because it makes various regex
> failing to match, breaking random components. In the actual problem case
> the issue were that many cron jobs were not running on the system for a
> mysterious reason, and the problem was because cron uses a regexp to
> filter invalid cronjob names, which is /^[a-zA-Z0-9_-]+$/.
If you want ascii rules, force LC_ALL=C or at least LC_COLLATE=C. So
cron is the culprit.
Bastian
--
Landru! Guide us!
-- A Beta 3-oid, "The Return of the Archons", stardate 3157.4
Reply to: