[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#104085: marked as done (QTextCodec::codecForLocale() is bogus)



Your message dated Sun, 7 Jan 2007 15:46:11 +0100
with message-id <200701071546.17661.debian@pusling.com>
and subject line Closing old woody bugs
has caused the attached Bug report to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere.  Please contact me immediately.)

Debian bug tracking system administrator
(administrator, Debian Bugs database)

--- Begin Message ---
Package: libqt2
Version: 2:2.3.1-3
Severity: normal
Tags: patch

Currently the logic for QTextCodec::codecForLocale() on non-Win32
systems is as follows:

  use $LANG environment variable
  if $LANG is not set and we're using X11, use $LC_CTYPE
  if the environment variable used doesn't have .CHARSET part, try to
    use the result of setlocale(LC_CTYPE, NULL)
  if there is a .CHARSET part use it, otherwise use hardcoded charset
    determined from locale name

However this code has the following problems:
- it ignores the overriding effects of $LC_ALL
- it gives $LANG precedence over $LC_CTYPE; while in the C and X11
  libraries the locale charset is determined by LC_CTYPE category
- it relies on environment variables trying to duplicate the job of the
  C library, and fails do that correctly (see the previous two points)
- hardcoded charset lists may become incorrect; e.g. lt_LT locale now
  usually uses ISO-8859-13 instead of ISO-8859-4 for at least a couple
  of years.

Precedence of $LANG over $LC_CTYPE is the worst problem for me
personally, since I do not want localized user interface and thus keep
LANG=C with LC_CTYPE=lt_LT.

As a side effect, this incorrect charset detection prevents me from
entering any Lithuanian (ISO-8859-13) text in all Qt/KDE applications,
but that IMHO is a separate bug (X11 keysym -> Unicode translation
should not depend on user's locale), I'll file a bug report for it
later.

To fix these problems I would suggest using the following heuristic for
locale charset detection: use nl_langinfo(CODESET) if provided by the C
library; and if not, try to use the locale name assigned to LC_CTYPE
category (by trying either setlocale(LC_CTYPE, 0), or the first nonempty
value of $LC_ALL, $LC_CTYPE, and $LANG).

I've attached a patch to src/tools/qtextcodec.cpp that is
- incomplete (I don't know how to add the test for _HAVE_NL_LANGINFO_
  into configure script)
- untested
- but should illustrate what exactly I have in mind, and (I hope) be
  useful after minor changes (adding the test for _HAVE_NL_LANGINFO_,
  and fixing any bugs).

My patch also contains a fix for lt and lt_LT locales -- they should use
ISO-8859-13, not ISO-8859-4.

-- System Information
Debian Release: testing/unstable
Architecture: i386
Kernel: Linux mg 2.2.19 #1 Tue Apr 17 01:04:24 EET 2001 i686
Locale: LANG=C, LC_CTYPE=lt_LT

Versions of packages libqt2 depends on:
ii  libc6                  2.2.3-6           GNU C Library: Shared libraries an
ii  libjpeg62              6b-1.3            The Independent JPEG Group's JPEG 
ii  libmng1                1.0.1-2           Multiple-image Network Graphics li
ii  libpng2                1.0.11-1          PNG library - runtime             
ii  libstdc++2.10-glibc2.2 1:2.95.4-0.010703 The GNU stdc++ library            
ii  xlibs                  4.0.3-4           X Window System client libraries  
ii  zlib1g                 1:1.1.3-15        compression library - runtime     

Marius Gedminas
-- 
If you can't understand it, it is intuitively obvious.
--- qtextcodec.cpp.orig	Mon Jul  9 17:10:35 2001
+++ qtextcodec.cpp	Mon Jul  9 17:41:03 2001
@@ -59,6 +59,10 @@
 #include <ctype.h>
 #include <locale.h>
 
+#ifdef _HAVE_NL_LANGINFO_
+#include <nl_types.h>
+#include <langinfo.h>
+#endif
 
 static QList<QTextCodec> * all = 0;
 static bool destroying_is_ok; // starts out as 0
@@ -454,7 +458,7 @@
     "eo", 0 };
 
 static const char * const iso8859_4locales[] = {
-    "ee", "ee_EE", "lt", "lt_LT", "lv", "lv_LV", 0 };
+    "ee", "ee_EE", "lv", "lv_LV", 0 };
 
 static const char * const iso8859_5locales[] = {
     "mk", "mk_MK",
@@ -478,6 +482,9 @@
 static const char * const iso8859_9locales[] = {
     "tr", "tr_TR", "turkish", 0 };
 
+static const char * const iso8859_13locales[] = {
+    "lt", "lt_LT", 0 };
+
 static const char * const iso8859_15locales[] = {
     "fr", "fi", "french", "finnish", "et", "et_EE", 0 };
 
@@ -561,77 +568,111 @@
     // Very poorly defined and followed standards causes lots of code
     // to try to get all the cases...
 
-    char * lang = qstrdup( getenv("LANG") );
-
-#ifdef _WS_X11_
-    // use LC_CTYPE if LANG is not available, as X11 uses it too, and we otherwise get inconsitencies between
-    // XmbLookupString and the Qt input mapper
-    if ( !lang || lang[0] == 0 )
-	lang = qstrdup( getenv("LC_CTYPE") );
+#ifdef _HAVE_NL_LANGINFO_
+    // Best choice, but not very portable -- nl_langinfo is XPG3, not POSIX
+    localeMapper = codecForName( nl_langinfo(CODESET) );
 #endif
 
-    char * p = lang ? strchr( lang, '.' ) : 0;
-    if ( !p || *p != '.' ) {
-        // Some versions of setlocale return encoding, others not.
-        char *ctype = qstrdup( setlocale( LC_CTYPE, 0 ) );
-        // Some Linux distributions have broken locales which will return
-        // "C" for LC_CTYPE
-        if ( qstrcmp( ctype, "C" ) == 0 ) {
-            delete [] ctype;
-        } else {
-            if ( lang )
-                delete [] lang;
-            lang = ctype;
-            p = lang ? strchr( lang, '.' ) : 0;
-        }
-    }
+    if ( !localeMapper ) {
+	// Try to determine locale codeset from locale name assigned to
+	// LC_CTYPE category.
+
+	// First part is getting that locale name.  First try setlocale() which
+	// definitely knows it, but since we cannot fully trust it, get ready
+	// to fall back to environment variables.
+        char * ctype = qstrdup( setlocale( LC_CTYPE, 0 ) );
+
+	// Get the first nonempty value from $LC_ALL, $LC_CTYPE, and $LANG
+	// environment variables.
+	char * lang = qstrdup( getenv("LC_ALL") );
+	if ( !lang || lang[0] == 0 ) {
+	    delete [] lang;
+	    lang = qstrdup( getenv("LC_CTYPE") );
+	}
+	if ( !lang || lang[0] == 0 ) {
+	    delete [] lang;
+	    lang = qstrdup( getenv("LANG") );
+	}
+
+	// Now try these in order:
+	// 1. CODESET from ctype if it contains a .CODESET part (e.g. en_US.ISO8859-15)
+	// 2. CODESET from lang if it contains a .CODESET part
+	// 3. ctype (maybe the locale is named "ISO-8859-1" or something)
+	// 4. locale (ditto)
+	// 5. guess locale from ctype unless ctype is "C"
+	// 6. guess locale from lang
+
+	// 1. CODESET from ctype if it contains a .CODESET part (e.g. en_US.ISO8859-15)
+	// FIXME: this might fail with locales like en_US.ISO8859-15@euro
+	char * codeset = ctype ? strchr( ctype, '.' ) : 0;
+	if ( codeset && *codeset == '.' )
+            localeMapper = codecForName( codeset + 1 );
+
+	// 2. CODESET from lang if it contains a .CODESET part
+	codeset = lang ? strchr( lang, '.' ) : 0;
+	if ( !localeMapper && codeset && *codeset == '.' )
+            localeMapper = codecForName( codeset + 1 );
+
+	// 3. ctype (maybe the locale is named "ISO-8859-1" or something)
+	if ( !localeMapper && ctype && *ctype != 0 )
+            localeMapper = codecForName( ctype );
+
+	// 4. locale (ditto)
+	if ( !localeMapper && lang && *lang != 0 )
+            localeMapper = codecForName( lang );
+
+	// 5. guess locale from ctype unless ctype is "C"
+	// 6. guess locale from lang
+	char * try_by_name = ctype;
+	if ( ctype && *ctype != 0 && qstrcmp (ctype, "C") != 0 )
+            try_by_name = lang;
+
+	// Now do the quessing.
+	if ( !localeMapper && try_by_name && *try_by_name != 0 ) {
+	    if ( try_locale_list( iso8859_2locales, lang ) )
+		localeMapper = codecForName( "ISO 8859-2" );
+	    else if ( try_locale_list( iso8859_3locales, lang ) )
+		localeMapper = codecForName( "ISO 8859-3" );
+	    else if ( try_locale_list( iso8859_4locales, lang ) )
+		localeMapper = codecForName( "ISO 8859-4" );
+	    else if ( try_locale_list( iso8859_5locales, lang ) )
+		localeMapper = codecForName( "ISO 8859-5" );
+	    else if ( try_locale_list( iso8859_6locales, lang ) )
+		localeMapper = codecForName( "ISO 8859-6-I" );
+	    else if ( try_locale_list( iso8859_7locales, lang ) )
+		localeMapper = codecForName( "ISO 8859-7" );
+	    else if ( try_locale_list( iso8859_8locales, lang ) )
+		localeMapper = codecForName( "ISO 8859-8-I" );
+	    else if ( try_locale_list( iso8859_9locales, lang ) )
+		localeMapper = codecForName( "ISO 8859-9" );
+	    else if ( try_locale_list( iso8859_13locales, lang ) )
+		localeMapper = codecForName( "ISO 8859-13" );
+	    else if ( try_locale_list( iso8859_15locales, lang ) )
+		localeMapper = codecForName( "ISO 8859-15" );
+	    else if ( try_locale_list( tis_620locales, lang ) )
+		localeMapper = codecForName( "ISO 8859-11" );
+	    else if ( try_locale_list( koi8_ulocales, lang ) )
+		localeMapper = codecForName( "KOI8-U" );
+	    else if ( try_locale_list( cp_1251locales, lang ) )
+		localeMapper = codecForName( "CP 1251" );
+	    else if ( try_locale_list( pt_154locales, lang ) )
+		localeMapper = codecForName( "PT 154" );
+	    else if ( try_locale_list( probably_koi8_rlocales, lang ) )
+		localeMapper = ru_RU_hack( lang );
+	}
+
+	// If everything failed, we default to 8859-1
+	// We could perhaps default to 8859-15.
+	if ( !localeMapper )
+	    localeMapper = codecForName( "ISO 8859-1" );
 
-    if( p && *p == '.' ) {
-        // if there is an encoding and we don't know it, we return 0
-        // User knows what they are doing.  Codecs will believe them.
-        localeMapper = codecForName( lang );
-        if ( !localeMapper ) {
-            // Use or codec disagree.
-            localeMapper = codecForName( p+1 );
-        }
-	if ( localeMapper && localeMapper->mibEnum() == 11 )
-	    localeMapper = codecForName( "ISO 8859-8-I" );
-    }
-    if ( !localeMapper || !(p && *p == '.') ) {
-        // if there is none, we default to 8859-1
-        // We could perhaps default to 8859-15.
-        if ( try_locale_list( iso8859_2locales, lang ) )
-            localeMapper = codecForName( "ISO 8859-2" );
-        else if ( try_locale_list( iso8859_3locales, lang ) )
-            localeMapper = codecForName( "ISO 8859-3" );
-        else if ( try_locale_list( iso8859_4locales, lang ) )
-            localeMapper = codecForName( "ISO 8859-4" );
-        else if ( try_locale_list( iso8859_5locales, lang ) )
-            localeMapper = codecForName( "ISO 8859-5" );
-        else if ( try_locale_list( iso8859_6locales, lang ) )
-            localeMapper = codecForName( "ISO 8859-6-I" );
-        else if ( try_locale_list( iso8859_7locales, lang ) )
-            localeMapper = codecForName( "ISO 8859-7" );
-        else if ( try_locale_list( iso8859_8locales, lang ) )
-            localeMapper = codecForName( "ISO 8859-8-I" );
-        else if ( try_locale_list( iso8859_9locales, lang ) )
-            localeMapper = codecForName( "ISO 8859-9" );
-        else if ( try_locale_list( iso8859_15locales, lang ) )
-            localeMapper = codecForName( "ISO 8859-15" );
-        else if ( try_locale_list( tis_620locales, lang ) )
-            localeMapper = codecForName( "ISO 8859-11" );
-        else if ( try_locale_list( koi8_ulocales, lang ) )
-            localeMapper = codecForName( "KOI8-U" );
-        else if ( try_locale_list( cp_1251locales, lang ) )
-            localeMapper = codecForName( "CP 1251" );
-        else if ( try_locale_list( pt_154locales, lang ) )
-            localeMapper = codecForName( "PT 154" );
-         else if ( try_locale_list( probably_koi8_rlocales, lang ) )
-            localeMapper = ru_RU_hack( lang );
-        else if (!lang || !(localeMapper = codecForName(lang) ))
-            localeMapper = codecForName( "ISO 8859-1" );
+	delete [] lang;
+	delete [] ctype;
     }
-    delete[] lang;
+
+    if ( localeMapper && localeMapper->mibEnum() == 11 )
+	localeMapper = codecForName( "ISO 8859-8-I" );
+
 #endif
 
     return localeMapper;

--- End Message ---
--- Begin Message ---
This bug is a old bug in debian woody that is fixed long ago, and debian woody 
is no longer supported, so closing this bug.
kdelibs3 is gone.
qt2 is gone (and qt3 uses different methods for gl, for fonts and for other 
tihngs)


/Sune
-- 
Genius, I'm not able to save the graphic mother board from the options within 
Photoshop 6.8, how does it work?

The point is that you neither can ever boot with a software, nor should digit 
on the controller to the mailer of a button to a line to reinstall a DirectX 
system.

Attachment: pgpZwrLhKU1h8.pgp
Description: PGP signature


--- End Message ---

Reply to: