[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly



On Sat, Oct 16, 2010 at 01:47:42AM +0400, Michael Tokarev wrote:
> There's a bug in et_EE.UTF-8 locale definition causing some latin
> chars to be treated as non-letters.  These are at least in range
> t..y inclusive, i.e. [t-y].  Like this:

Are you sure that the letters t to y a valid in the alphabet used by the
estonian language?

Wikipedia shows the following alphabet:
a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, š, z, ž, t, u,
v, w, õ, ä, ö, ü, x, y

So a-z matches only half the alphabet. t to y are _after_ z.

> I.e., the latin letter "t" does not match [a-z] regexp.

As expected. Use the character classes if you want all small characters.

> This is a critical issue unfortunately, because it makes various regex
> failing to match, breaking random components.  In the actual problem case
> the issue were that many cron jobs were not running on the system for a
> mysterious reason, and the problem was because cron uses a regexp to
> filter invalid cronjob names, which is /^[a-zA-Z0-9_-]+$/.

If you want ascii rules, force LC_ALL=C or at least LC_COLLATE=C. So
cron is the culprit.

Bastian

-- 
Landru! Guide us!
		-- A Beta 3-oid, "The Return of the Archons", stardate 3157.4



Reply to: