[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#968437: xindy-rules: Incorrect Norwegian sorting of č and š



Hi Bruno

Norbert from the TeX Live (and Debian TeX Team) here.

(please keep Cc)

here in Debian we got a bug report about using xindy with Norwegian and
Saami words:

On Sat, 15 Aug 2020, Petter Reinholdtsen wrote:
> I ran into this problem when using dblatex and xindy to typeset a book,
> where the index ended up with the wrong sorting order.  This is a
> Norwegian book with some North Saami words in the body and index.  Every
> Saami word starting with č and š are incorrectly sorted as starting with
> a symbol, while they should be sorted with c and s, respectively.


The reported also provided a patch how to fix this, see below.

Could you please comment on whether this is valid or not? If yes, I
would also update the xindy rules in TeX Live.

Thanks a lot for your time

Norbert

On Sat, 15 Aug 2020, Petter Reinholdtsen wrote:
> I tested a but further, and can confirm that this patch solve the
> problem:
> 
> --- xindy-2.5.1.20160104.orig/make-rules/alphabets/norwegian/utf8.pl.in	2020-08-15 11:41:26.000000000 +0200
> +++ xindy-2.5.1.20160104/make-rules/alphabets/norwegian/utf8.pl.in	2020-08-15 11:46:27.976538044 +0200
> @@ -11,10 +11,9 @@
>                     [], # a with ogonek (polish)
>  ['B',  ['b','B']],
>                     [], # b with hook (hausa)
> -['C',  ['c','C'],['ç','Ç']],
> +['C',  ['c','C'],@u{['č','Č'],}['ç','Ç']],
>                     [], # ch (spanish/traditional)
>                     [], # cs (hungarian)
> -                   [], # c with caron (many)
>                     [], # c with acute (croatian, lower sorbian, polish)
>                     [], # c with circumflex (esperanto)
>                     [], # c with cedilla (albanian, kurdish, turkish)
> @@ -85,10 +84,9 @@
>                     [], # r with caron (czech, slovak/large, upper sorbian)
>                     [], # r with acute (lower sorbian)
>                     [], # r with cedilla/comma (latvian)
> -['S',  ['s','S']],
> +['S',  ['s','S'], @u{['š', 'Š']}],
>                     [], # sh (albanian)
>                     [], # sz (hungarian)
> -                   [], # s with caron (many)
>                     [], # s with acute (lower sorbian, polish)
>                     [], # s with circumflex (esperanto)
>                     [], # s with comma below (romanian)
> 
> I had to wrap using @u{} as the caron characters are missing in
> ISO-8859-1 (aka latin1).


--
PREINING Norbert                              https://www.preining.info
Accelia Inc. + IFMGA ProGuide + TU Wien + JAIST + TeX Live + Debian Dev
GPG: 0x860CDC13   fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13


Reply to: