[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#989814: kodi: Kodi UI breaks with tr_TR.UTF-8 charset.



Hello Vasyl,

Sorry for my late reply. Your first reply to this bug didn’t generate a notification, so I missed your feedback on Sunday. Actually, there’s an edge case in Turkish language which doesn’t present in other languages, and our late arrival to localization scene allowed this edge case to be unhandled for a very long time.

In English and most other languages i.upper() = I and I.lower() = i. However, in Turkish, the case is as follows:

i.upper() = İ
İ.lower() = i
ı.upper() = I
I.lower() = ı

Hence, we have four letters:. ı, I, i, İ. When a code piece doesn’t expect İ or ı, or changes cases of these letters and end up with English counterparts,  things can go wrong. Sometimes subtly, sometimes horribly wrong. This is what’s happening there most probably.

Similarly, when locale is set to Turkish and a piece of code gets İ instead of an I when it calls i.upper(), things break again.

We have other letters ü/Ü, ğ/Ğ, ş/Ş, ç/Ç, ö/Ö. However, some of these are present in other languages too, and their lower/upper conversion are not tricky.

Coding Horror has a nice write-up about this issue: https://blog.codinghorror.com/whats-wrong-with-turkey/ which sums up the issue probably better than me.

I’ll provide similar information to the official bugzilla today.

Please don't hesitate to contact for any questions,

Best regards,

Hakan


> On 24 Jun 2021, at 23:10, Vasyl Gello <vasek.gello@gmail.com> wrote:
> 
> Package: kodi
> Followup-For: Bug #989814
> X-Debbugs-Cc: aurel32@debian.org, mattia@debian.org, hakan@bayindir.org
> 
> Control: reassign -1 glibc
> Control: retitle -1 Broken CTYPE in Turkish and Azeri UTF-8 locales
> Control: severity -1 serious
> 
> Dear colleagues,
> 
> I was able to track down the issue reported in Kodi and unfortunately
> it is a glibc / locales issue.
> 
> Specifically, all functions relying on LC_CTYPE set to Turkish or Azeri
> UTF-8 locales (like 'tolower()' for example) produce incorrect results.
> 
> I linked the reproducer in Kodi upstream Github issue:
> https://github.com/xbmc/xbmc/issues/19883#issuecomment-867911187
> 
> I upgraded the severity of the bug because the flawed behavior of locale
> functions definitely needs further investigation.
> 
> Hakan, maybe you can shed somelight on what can be wrong with lower / upper
> conversions in Turkish?
> 
> 
> Sincerely,
> Vasyl
> 
> -- System Information:
> Debian Release: 11.0
>  APT prefers unstable-debug
>  APT policy: (500, 'unstable-debug'), (500, 'buildd-unstable'), (500, 'unstable')
> Architecture: amd64 (x86_64)
> 
> Kernel: Linux 5.10.0-7-amd64 (SMP w/6 CPU threads)
> Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
> Locale: LANG=C, LC_CTYPE=C (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8), LANGUAGE not set
> Shell: /bin/sh linked to /bin/dash
> Init: unable to detect
> 
> Versions of packages kodi depends on:
> pn  kodi-bin   <none>
> pn  kodi-data  <none>
> 
> Versions of packages kodi recommends:
> pn  kodi-repository-kodi | kodi-repository  <none>
> pn  kodi-visualization-spectrum             <none>
> 
> kodi suggests no packages.


Reply to: