[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#253404: Bug #253404: Set root's locale to POSIX on the installed system for "complex" (CJK, Indic, RTL...) languages



Bug #253404 is about a quite well known issue for newly installed
systems:

When one installs a system using a language such as Chinese, Japanese,
Korean, Arabic, etc...the text installer works fine because it works
in a framebuffered environment. Similarly base-config ran fine because
it was running inside bterm.

However, in that case, the installed system uses the installation
locale as the general locale for the installed system byt setting LANG
and LANGUAGE in /etc/environment.

The consequence for this is Mojibake at the console of the installed
system.

Two solutions have been proposed in #253404 and 263575:

1) set the general locale to POSIX for these languages
2) set root's environment for this POSIX localization

In #263575, Kenshi proposed a solution involving patching base-files,
which was rejected by the base-files maintainer.

With tha attached patch, I propose another solution, probably not
perfect...but mostly aimed to trigger more discussion:

Create a new language category ("category 3") for languages that can
use the text installer but cannot be displayed in a regulard Linux
console.

The current category 3 (used for languages that require the graphical
installer) then becomes category 4....

Then, in localechooser's post-base-installer, the script populated
/target/root/.profile with "LANG=C", "LANGUAGE=C" and a comment to
explain why this is done this way.

Please comment.


diff -Nru localechooser.ori/languagelist localechooser.new/languagelist
--- localechooser.ori/languagelist	2006-02-24 18:16:30.092310481 +0100
+++ localechooser.new/languagelist	2006-02-24 18:25:25.527893364 +0100
@@ -2,10 +2,10 @@
 # This is the complete list of languages (locales) to choose from.
 # Language;supported_environments;locale;fallbacklocale;langcode;countrycode;langlist;console-data
 Albanian;2;sq_AL;sq_AL.UTF-8;sq;AL;sq_AL:sq:en_GB:en;kbd=lat0-sun16(utf8)
-Arabic;2;ar;ar_EG.UTF-8;ar;EG;ar_EG:en_GB:en;
+Arabic;3;ar;ar_EG.UTF-8;ar;EG;ar_EG:en_GB:en;
 Basque;1;eu;eu_ES.UTF-8;eu;ES;eu_ES:eu:en_GB:en;kbd=lat0-sun16(utf8)
 Belarusian;2;be_BY;be_BY.UTF-8;be;BY;be_BY:be:en_GB:en;cyr=uni,16,utf-8,by(ctrl_shift_toggle)
-Bengali;3;bn;bn_BD;bn;BD;bn_BD:bn:en_IN:en_GB:en;
+Bengali;4;bn;bn_BD;bn;BD;bn_BD:bn:en_IN:en_GB:en;
 Bosnian;2;bs_BA;bs_BA.UTF-8;bs;BA;bs_BA:bs:en_GB:en;kbd=Lat2-Terminus16(utf8)
 Bulgarian;2;bg_BG;bg_BG;bg;BG;bg_BG:bg:en_GB:en;kbd=ruscii_8x16(cp1251)
 # For C locale, set language to 'en' to make sure questions are "translated"
@@ -15,8 +15,8 @@
 # Special case for Chinese as the two flavours share the same ISO 639 code
 # Both will trigger countrychooser. Each will be the backup for the other
 # one
-Chinese (Simplified);2;zh;zh_CN;zh;CN;zh_CN:zh:en_US:en;
-Chinese (Traditional);2;zh;zh_TW;zh;TW;zh_TW:zh:en_US:en;
+Chinese (Simplified);3;zh;zh_CN;zh;CN;zh_CN:zh:en_US:en;
+Chinese (Traditional);3;zh;zh_TW;zh;TW;zh_TW:zh:en_US:en;
 Croatian;2;hr_HR;hr_HR.UTF-8;hr;HR;hr_HR:hr:en_GB:en;kbd=lat2-sun16(utf8)
 Czech;2;cs_CZ;cs_CZ.UTF-8;cs;CZ;cs_CZ:cs:en_GB:en;kbd=lat2-sun16(utf8)
 Danish;1;da_DK;da_DK.UTF-8;da;DK;da_DK:da:en_GB:en;kbd=lat0-sun16(utf8)
@@ -30,46 +30,46 @@
 Finnish;1;fi_FI;fi_FI.UTF-8;fi;FI;fi_FI:fi:en_GB:en;kbd=lat0-sun16(utf8)
 French;1;fr;fr_FR.UTF-8;fr;FR;fr_FR:fr:en_GB:en;kbd=lat9u-16(utf8)
 Galician;1;gl_ES;gl_ES.UTF-8;gl;ES;gl_ES:gl:en_GB:en;kbd=lat0-sun16(utf8)
-#X Georgian;3;ka_GE;ka_GE;ka;GE;ka_GE:ka:en_GB:en;kbd=ka8x16thin(utf8)
+#X Georgian;4;ka_GE;ka_GE;ka;GE;ka_GE:ka:en_GB:en;kbd=ka8x16thin(utf8)
 German;1;de;de_DE.UTF-8;de;DE;de_DE:de:en_GB:en;kbd=lat0-sun16(utf8)
 Greek;2;el_GR;el_GR.UTF-8;el;GR;el_GR:el:en_GB:en;kbd=iso07.f16(utf8)
-#X Gujarati;3;gu_IN;gu_IN;gu;IN;gu_IN:gu:en_IN:en_GB:en;
-Hebrew;2;he_IL;he_IL.UTF-8;he;IL;he_IL:he:en_GB:en;kbd=LatArCyrHeb-16(utf8)
-Hindi;3;hi_IN;hi_IN;hi;IN;hi_IN:hi:en_IN:en_GB:en;
+#X Gujarati;4;gu_IN;gu_IN;gu;IN;gu_IN:gu:en_IN:en_GB:en;
+Hebrew;3;he_IL;he_IL.UTF-8;he;IL;he_IL:he:en_GB:en;kbd=LatArCyrHeb-16(utf8)
+Hindi;4;hi_IN;hi_IN;hi;IN;hi_IN:hi:en_IN:en_GB:en;
 Hungarian;2;hu_HU;hu_HU.UTF-8;hu;HU;hu_HU:hu:en_GB:en;kbd=lat2-sun16(utf8)
 Icelandic;1;is_IS;is_IS.UTF-8;is;IS;is_IS:is:en_GB:en;kbd=lat9u-16(utf8)
 Indonesian;1;id_ID;id_ID.UTF-8;id;ID;id_ID:id:en_GB:en;kbd=lat0-sun16(utf8)
 #X Irish;1;ga_IE@euro;ga_IE@euro;ga;IE;ga_IE;ga:en_IE:en_GB:en;kbd=lat0-sun16(iso15)
 Italian;1;it;it_IT.UTF-8;it;IT;it_IT:it:en_GB:en;kbd=lat0-sun16(utf8)
-Japanese;2;ja_JP;ja_JP.UTF-8;ja;JP;ja_JP:ja:en_GB:en;
-#X Kannada;3;kn_IN;kn_IN;kn;IN;kn_IN:kn:en_IN:en_GB:en;
+Japanese;3;ja_JP;ja_JP.UTF-8;ja;JP;ja_JP:ja:en_GB:en;
+#X Kannada;4;kn_IN;kn_IN;kn;IN;kn_IN:kn:en_IN:en_GB:en;
 Kazakh;2;kk_KZ;kk_KZ;kk;KZ;kk_KZ;kz:en_GB:en;kbd=ruscii_8x16(koi8-r)
-#X Khmer;3;km_KH;km_KH;km;KH;km_KH:km:en_GB:en;
-Korean;2;ko_KR.UTF-8;ko_KR.UTF-8;ko;KR;ko_KR:ko:en_US:en;
+#X Khmer;4;km_KH;km_KH;km;KH;km_KH:km:en_GB:en;
+Korean;3;ko_KR.UTF-8;ko_KR.UTF-8;ko;KR;ko_KR:ko:en_US:en;
 Kurdish;2;ku_TR.UTF-8;ku_TR;ku;TR;ku_TR:ku:en_GB:en;kbd=Lat15-Terminus16(utf8)
-#X Lao;3;lo_LA;lo_LA;lo;LA;lo_LA:lo:en_GB:en;
+#X Lao;4;lo_LA;lo_LA;lo;LA;lo_LA:lo:en_GB:en;
 Latvian;2;lv_LV;lv_LV.UTF-8;lv;LV;lv_LV:lv:en_GB:en;kbd=lat7-14(utf8)
 Lithuanian;2;lt_LT;lt_LT.UTF-8;lt;LT;lt_LT:lt:en_GB:en;kbd=LatArCyrHeb-16(utf8)
 Malagasy;1;mg_MG;mg_MG.UTF-8;mg;MG;mg_MG:fr_FR:fr:en_GB:en;kbd=lat0-sun16(utf8)
-#X Malayalam;3;ml_IN;ml_IN;ml;IN;ml_IN:ml:en_IN:en_GB:en;
+#X Malayalam;4;ml_IN;ml_IN;ml;IN;ml_IN:ml:en_IN:en_GB:en;
 Macedonian;2;mk_MK;mk_MK.UTF-8;mk;MK;mk_MK:mk:en_GB:en;kbd=iso05.f16(utf8)
-#X Nepali;3;ne_NP;ne_NP;ne;NP;ne_NP:ne:en_GB:en;
+#X Nepali;4;ne_NP;ne_NP;ne;NP;ne_NP:ne:en_GB:en;
 # The Sami translation is really incomplete. We however keep Sami on request
 # of Skolelinux as a kind of reward to them..:-). They need to be able to 
 # choose Sami as an option so that the Sami locale is set as default
 Northern Sami;1;se_NO;se_NO.UTF-8;se;NO;se_NO:nb_NO:nb:no_NO:no:nn_NO:nn:da:sv:en_GB:en;kbd=lat0-sun(utf8)
 Norwegian Bokmaal;1;nb_NO;nb_NO.UTF-8;nb;NO;nb_NO:nb:no_NO:no:nn_NO:nn:da:sv:en_GB:en;kbd=lat0-sun16(utf8)
 Norwegian Nynorsk;1;nn_NO;nn_NO.UTF-8;nn;NO;nn_NO:nn:no_NO:no:nb_NO:nb:da:sv:en_GB:en;kbd=lat0-sun16(utf8)
-Persian;2;fa_IR;fa_IR;fa;IR;fa_IR:en_GB:en;kbd=iso06.f16(utf8)
+Persian;3;fa_IR;fa_IR;fa;IR;fa_IR:en_GB:en;kbd=iso06.f16(utf8)
 Polish;2;pl_PL;pl_PL.UTF-8;pl;PL;pl_PL:pl:en_GB:en;kbd=lat2-sun16(utf8)
 Portuguese;1;pt;pt_PT.UTF-8;pt;PT;pt_PT:pt:pt_BR:en_GB:en;kbd=lat0-sun16(utf8)
 Portuguese (Brazil);1;pt_BR;pt_BR.UTF-8;pt;BR;pt_BR:pt:pt_PT;kbd=lat1-16(utf8)
-Punjabi (Gurmukhi);3;pa_IN;pa_IN;pa;IN;pa_IN:en_GB:en
+Punjabi (Gurmukhi);4;pa_IN;pa_IN;pa;IN;pa_IN:en_GB:en
 Romanian;2;ro_RO;ro_RO.UTF-8;ro;RO;ro_RO:ro:en_GB:en;kbd=Lat2-Terminus16(utf8)
 # The following may be used only when the needed changes will happen in base-config
 # Russian;2;ru;ru_RU.KOI8-R;ru;RU;ru_RU:ru:en_GB:en;kbd=ruscii_8x16(koi8-r)
 Russian;2;ru;ru_RU.KOI8-R;ru;RU;ru_RU:ru:en_GB:en;cyr=uni,16,koi8-r,ru_ms(ctrl_shift_toggle)
-#X Sanskrit;3;sa_IN;sa_IN;sa;IN;sa_IN:sa:en_IN:en_GB:en;
+#X Sanskrit;4;sa_IN;sa_IN;sa;IN;sa_IN:sa:en_IN:en_GB:en;
 # Serbian commented for consistency: too incomplete
 #X Serbian;2;sr_YU.UTF-8@cyrillic;sr_YU.UTF-8@cyrillic;sr;CS;sr_CS:sr_YU:sr:bs:en_GB:en;kbd=iso05.f16(utf8)
 Slovak;2;sk_SK;sk_SK.UTF-8;sk;SK;sk_SK:sk:en_GB:en;kbd=lat2-sun16(utf8)
@@ -77,10 +77,10 @@
 Spanish;1;es;es_ES.UTF-8;es;ES;es_ES:es:en_GB:en;kbd=lat0-sun16(utf8)
 Swedish;1;sv;sv_SE.UTF-8;sv;SE;sv_SE:sv:en_GB:en;kbd=lat0-sun16(utf8)
 Tagalog;1;tl;tl_PH.UTF-8;tl;PH;tl_PH:tl:en_US:en;kbd=lat0-sun16(utf8)
-# Tamil;3;ta_IN;ta_IN;ta;IN;ta_IN:ta:en_IN:en_GB:en;
+# Tamil;4;ta_IN;ta_IN;ta;IN;ta_IN:ta:en_IN:en_GB:en;
 Turkish;2;tr_TR.UTF-8;tr_TR;tr;TR;tr_TR:tr:en_GB:en;kbd=Lat15-Terminus16(utf8)
 Ukrainian;2;uk_UA;uk_UA.UTF-8;uk;UA;uk_UA:uk:en_GB:en;kbd=ruscii_8x16(utf8)
-Vietnamese;2;vi_VN;vi_VN.UTF-8;vi;VN;vi_VN:vi:en_GB:en;
+Vietnamese;3;vi_VN;vi_VN.UTF-8;vi;VN;vi_VN:vi:en_GB:en;
 Welsh;2;cy_GB;cy_GB.UTF-8;cy;GB;cy_GB:en_GB:en;kbd=iso14.f16(utf8)
 Wolof;2;wo_SN;wo_SN;wo;SN;wo_SN:wo:en_GB:en;
 Xhosa;2;xh_ZA;xh_ZA.UTF-8;xh;ZA;xh_ZA:en_GB:en;kbd=lat0-sun16(utf8)
diff -Nru localechooser.ori/localechooser localechooser.new/localechooser
--- localechooser.ori/localechooser	2006-02-24 18:16:30.067315542 +0100
+++ localechooser.new/localechooser	2006-02-24 18:24:04.117378085 +0100
@@ -82,8 +82,7 @@
 				# More than one match; try matching the
 				# country as well.
 				countrypart="${1#*_}"
-				countrypart="${countrypart%%[@.]*}"
-				countryline="$(echo "$line" | grep ";$countrypart\$" || true)"
+				countryline="$(echo "$line" | grep ";$countrypart\$")"
 				if [ "$countryline" ]; then
 					echo "${countryline%%;*}"
 					return
@@ -199,8 +198,8 @@
 #
 # No framebuffer and text interface      -->level 0 (only ASCII)
 # No framebuffer and other interface     -->level 1 (only Latin1)
-# Framebuffer and non graphical interface-->level 2 (no combining langs)
-# Framebuffer and graphical interface    -->level 3 (all langs)
+# Framebuffer and non graphical interface-->level 2 and 3 (no combining langs)
+# Framebuffer and graphical interface    -->level 4 (all langs)
 # Depending on these values, we use different templates with a different
 # list of languages. These lists are built at the package build-time
 # from the contents of the second field of languagelist entries
diff -Nru localechooser.ori/mktemplates.language localechooser.new/mktemplates.language
--- localechooser.ori/mktemplates.language	2006-02-24 18:16:29.218487411 +0100
+++ localechooser.new/mktemplates.language	2006-02-24 18:24:04.120377478 +0100
@@ -79,8 +79,8 @@
 
 my @languagenames_ascii   = get_language_names($list,0);
 my @languagenames_latin   = get_language_names($list,1);
-my @languagenames_fb   = get_language_names($list,2);
-my @languagenames_all   = get_language_names($list,3);
+my @languagenames_fb   = get_language_names($list,3);
+my @languagenames_all   = get_language_names($list,4);
 my %translationmap  = get_translations($listl10n);
 
 sub order_trans {
diff -Nru localechooser.ori/post-base-installer localechooser.new/post-base-installer
--- localechooser.ori/post-base-installer	2006-02-24 18:16:30.063316351 +0100
+++ localechooser.new/post-base-installer	2006-02-24 18:24:04.122377073 +0100
@@ -46,6 +46,28 @@
 if [ "$LANGLIST" ]; then
 	echo "LANGUAGE=\"$LANGLIST\"" >> $DESTFILE
 fi
+
+# For languages that have no chance to be displayed at the Linux console
+# let's set root's environment with a non localized environment
+ROOTPROFILE="/target/root/.profile"
+# We must map the language to its "level" from languagelist
+LANGUAGECODE=`echo $LOCALE|cut -f1 -d_`
+# For language with multiple entries such as pt/pt_BR or zh_CN/zh_TW
+# we don't really care about the entry we will match as the level will always 
+# be the same 
+LEVEL=`cat /usr/share/localechooser/languagelist |\
+	cut -f 2-3 -d\; | \
+	grep "$LANGUAGECODE" | \
+	head -1 | \
+	cut -f1 -d\;`
+if [ "$LEVEL" = "3" ] || [ "$LEVEL" = "4" ] ; then
+	echo "# Installed by Debian Installer:" >>$ROOTPROFILE
+	echo "#  no localization for root because $LOCALE" >>$ROOTPROFILE
+	echo "#  cannot be properly displayed at the Linux console" >>$ROOTPROFILE
+	echo "LANG=C" >>$ROOTPROFILE
+	echo "LANGUAGE=C" >>$ROOTPROFILE
+fi
+
 # If the locale isn't already valid, append it to locale.gen
 gen=
 if log-output -t localechooser --pass-stdout \
diff -Nru localechooser.ori/README localechooser.new/README
--- localechooser.ori/README	2006-02-24 18:16:30.020325056 +0100
+++ localechooser.new/README	2006-02-24 18:29:39.218522958 +0100
@@ -251,7 +251,9 @@
 		     1=OK in Latin-1 only environment (Linux console
                        without framebuffer, some serial consoles)
                      2=OK with the framebuffer console
-                     3=OK only with the graphical installer
+                     3=OK with the framebuffer console but not
+                       possible to display in the regular Linux console
+                     4=OK only with the graphical installer
    -locale          =default locale
                      xx for languages with multiple countries
                      i.e. languages for which a valid locale may exist

Reply to: