[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#968437: xindy-rules: Incorrect Norwegian sorting of č and š



Package: xindy-rules
Version: 2.5.1.20160104-5
Severity: important
Tags: patch upstream

Dear xindy-rules maintainers,

I ran into this problem when using dblatex and xindy to typeset a book,
where the index ended up with the wrong sorting order.  This is a
Norwegian book with some North Saami words in the body and index.  Every
Saami word starting with č and š are incorrectly sorted as starting with
a symbol, while they should be sorted with c and s, respectively.

Setting severity to important, as there is no known workaround and the
problem is fatal when trying to create a print ready book using xindy.

I had a look at the code, but do not really know how this is supposed to
work.  I suspect the correct fix is the untested patch below.  Am I on
the right track here?  I verified the ordering of č and ç by comparing
it with the nb_NO locale.

diff --git a/make-rules/alphabets/norwegian/utf8.pl.in b/make-rules/alphabets/norwegian/utf8.pl.in
index 902b07b..9b30a88 100644
--- a/make-rules/alphabets/norwegian/utf8.pl.in
+++ b/make-rules/alphabets/norwegian/utf8.pl.in
@@ -11,10 +11,9 @@ $alphabet = [
                    [], # a with ogonek (polish)
 ['B',  ['b','B']],
                    [], # b with hook (hausa)
-['C',  ['c','C'],['ç','Ç']],
+['C',  ['c','C'],['č','Č'],['ç','Ç']],
                    [], # ch (spanish/traditional)
                    [], # cs (hungarian)
-                   [], # c with caron (many)
                    [], # c with acute (croatian, lower sorbian, polish)
                    [], # c with circumflex (esperanto)
                    [], # c with cedilla (albanian, kurdish, turkish)
@@ -85,10 +84,9 @@ $alphabet = [
                    [], # r with caron (czech, slovak/large, upper sorbian)
                    [], # r with acute (lower sorbian)
                    [], # r with cedilla/comma (latvian)
-['S',  ['s','S']],
+['S',  ['s','S'], ['š', 'Š']],
                    [], # sh (albanian)
                    [], # sz (hungarian)
-                   [], # s with caron (many)
                    [], # s with acute (lower sorbian, polish)
                    [], # s with circumflex (esperanto)
                    [], # s with comma below (romanian)

-- 
Happy hacking
Petter Reinholdtsen


Reply to: