Here's a useful initial analysis by Arne around input methods and default core font selections: -------- Original Message -------- Subject: Re: [i18n] Input Method and Fonts improvements for Gutsy Date: Fri, 10 Aug 2007 18:35:34 +0800 From: Arne Goetje <arne@ubuntu.com> To: ubuntu-devel@lists.ubuntu.com, Michael Vogt <michael.vogt@ubuntu.com>, Martin Pitt <martin.pitt@ubuntu.com> References: <46B7572B.8010206@canonical.com> <20070807143037.GE12904@alcor.net> Matt Zimmerman wrote: > On Tue, Aug 07, 2007 at 01:15:23AM +0800, Arne Goetje wrote: >> 1. Input Method (SCIM): >> Both Live CD and default installation come with the SCIM package >> installed, however it is not properly set up, so that the user actually >> cannot use it. > > This was working at one point; Michael Vogt was involved with it. CCing > him. > >> SCIM depends on some environment variables and the SCIM demon started in >> the background. There is a nice tool, called im-switch, which takes care >> of this. > > im-switch is installed by language-support packages corresponding to > languages which require it. The trouble, of course, is that none of these > are installed on the live CD due to space constraints. > > So we may need to find a way to get scim installed, but selectively enabled > depending on the language, or perhaps rethink the way we handle > language-support. OK, I did some more tests... SCIM does work, if the user makes a right mouse click in the application window and selects "Input Method -> SCIM". This works in all UTF-8 locales. But as this step is not obvious to the general novice user, I recommend to set the environment variable(s) GTK_IM_MODULE=scim (and QT_IM_MODULE=scim). That way SCIM works like expected. For the Live CD, this approach is enough, but for the default installation the scim-bridge-* packages should also be installed. I'll have to dig a bit further how they need to be configured, but the are supposed to solve a few problems with 3rd party applications (Acrobat Reader, Skype, etc.) >> I highly recommend, that we put the following packages and their >> dependencies into the Live CD and the default installation to make it >> become more useful: >> * scim-anthy or scim-prime: Japanese input methods, scim-prime is a >> dictionary based IM, which has a great advantage over anthy. Although >> both are widely used in Japan. >> * scim-chewing: Traditional Chinese phonetic IM, widely used in Taiwan >> * scim-pinyin: Simplified and Traditional Chinese Pinyin IM, widely >> used in China and by foreigners in Taiwan. ;) >> * scim-hangul: As the name says it - Korean. >> * scim-tables-zh: additional table based IMs for Simplified and >> Traditional Chinese, many of them are popular in China, Hong Kong and >> Taiwan. >> * scim-thai: well, Thai. :) >> * scim-m17n: bridge to the m17n library, which adds a lot of additional >> IMs, including Latin based ones for the European languages with >> diacritics. (not everyone likes to fiddle with XKB settings. ;) ) > > As with im-switch, these modules are installed by the relevant > language-support packages. It would be useful for you to review their > dependencies and establish whether they are correct. We can then make > decisions on language support simply by selecting the relevant > language-support package, which will conveniently keep track of which > packages are relevant for which languages. Well, if I need to input Chinese and Japanese on an English system, I don't want to install a few dozen files from the language packs, especially if the translations are useless for me anyways. ;) Installing all above mentioned modules with their dependencies on the Live CD, needs about 48MB additional space. (I selected scim-anthy here over scim-prime). If we remove some font packages and create a core-fonts package instead, we can save about 30 MB or more (see below). >> The following packages may NOT be installed: >> * scim-uim: BROKEN, will trash the SCIM setup tool. Don't install it. >> * scim-chinese: old version of scim-pinyin, not compatible with the >> current scim package; breaks dependency handling. > > scim-uim seems to be installed with Edubuntu only. What is the trouble with > it? Can it be fixed? If not, should it be removed entirely? scim-uim is not actively maintained. When this package is installed, the SCIM setup tool (GUI) always crashes with a segfault. Removing the package solves the issue. > Likewise for scim-chinese. We don't seem to be using it, so if it isn't > needed, it should probably be removed to reduce confusion. scim-chinese is the old version of scim-pinyin. The package got renamed with the SCIM API change between 1.2.x and 1.4.0. scim-chinese does not work with the current scim version and actually conflicts with it. Therefor it should be removed. >> 3. Fonts: >> b) Font packages: >> Option 1: We craft a seperate package, just for the Live CD and put >> selected fonts from the other font packages together, just for this >> single purpose. >> Caveat: might conflict with the other font packages (duplicate fonts >> files), should probably not be used on the default installation on the >> users' harddisks. > > This is an interesting idea, as it would allow us to continue to provide > legible fonts for many languages without creating so much confusion with a > huge number of default fonts. I have spent some time to compare the default installed fonts on the Live CD with additional fonts available in the repositories. Currently the /usr/share/fonts/truetype/ directory uses about 94 MB of space. Below is a list of fonts, I consider necessary as core fonts to display all kinds of scripts. I made the selection with screen readability and complex font requirements in mind. ------------------------------------------------------------------------ Font Name Package Scripts Filesize ------------------------------------------------------------------------ DejaVu Sans ttf-dejavu Multiple 519412 DejaVu Sans Bold ttf-dejavu Multiple 493320 DejaVu Sans Mono ttf-dejavu Multiple 289712 DejaVu Sans Mono Bold ttf-dejavu Multiple 278376 DejaVu Serif ttf-dejavu Multiple 213360 DejaVu Serif Bold ttf-dejavu Multiple 204988 MgOpenCanonica ttf-mgopen Greek 281580 MgOpenCanonica Bold ttf-mgopen Greek 284968 MgOpenModerna ttf-mgopen Greek 60404 MgOpenModerna Bold ttf-mgopen Greek 57592 Abyssinica SIL ttf-sil-abyssinica Ethiopian(Amharic) 619012 Ezra SIL ttf-sil-ezra Hebrew 153392 PakType Tehreer ttf-paktype Arabic, Farsi, Urdu 308756 Scheherazade ttf-scheherazade Arabic, Farsi, Urdu 260392 Lohit Bengali ttf-bengali-fonts Bengali 138536 Chandas ttf-devanagari-fonts Devanagari 2584956 Lohit Gujarati ttf-gujarati-fonts Gujarati 79168 Lohit Kannada ttf-kannada-fonts Kannada 186364 AnjaliOldLipi ttf-malayalam-fonts Malayalam 433556 Lohit Oriya ttf-oriya-fonts Oriya 93140 Saab ttf-punjabi-fonts Punjabi 114092 Lohit Tamil ttf-tamil-fonts Tamil 64760 Pothana2000 ttf-telugu-fonts Telugu 194268 Padauk ttf-sil-padauk Myanmar 146104 Padauk Bold ttf-sil-padauk Myanmar 148632 Khmer OS System ttf-khmeros Khmer 265624 PhetsarathOT ttf-lao Lao 92828 Loma ttf-thai-tlwg Thai 37140 Loma-Bold ttf-thai-tlwg Thai 37964 AR PL ShanHeiSun Uni ttf-arphic-uming Han 20890468 UnBatang ttf-unfonts Hangul 3678974 UnBatangBold ttf-unfonts Hangul 4070868 UnDotum ttf-unfonts Hangul 2209390 UnDotumBold ttf-unfonts Hangul 2808360 Sazanami Mincho ttf-sazanami-mincho Japanese 10554196 Sazanami Gothic ttf-sazanami-gothic Japanese 7690324 SIL Yi ttf-sil-yi Yi 463336 TibetianMachineUniAlpha ttf-tmuni Tibetian, Dzongkha 1355768 ------------------------------------------------------------------------- Total 62364080 ------------------------------------------------------------------------- * The filesizes for DejaVu and AR PL ShanHeiSun Uni fonts are those from the current packages, newer versions will differ. * DejaVu should be upgraded to 2.18 to include Georgian script. * Paktype Tehreer and Scheharazade both contain almost the same glyphs and face and I think only one of them is needed. They are supposed to replace the ttf-arabeyes fonts, because those lack Farsi and Urdu support. * Question is if we need to keep the Bold versions... cold save some additional space. * the Unfonts fonts are supposed to replace the Baekmuk fonts. * The Sazanami fonts are supposed to replace the Kochi fonts. * All these fonts are supposed to be used instead of the DejaVu fonts for their individual script coverage, because their complex script support and/or shapes are better than DejaVu's. * These fonts are supposed to be taken out of their packages and put together into a new core-fonts packages. Installing their original packages will waste a lot of space. >> b) CJK fonts: >> This topic really is... erm... difficult. >> For the Arphic fonts (and probably also a Heiti (sans-serif, like DejaVu >> Sans) and Yuanti (rounded, like Kochi Gothic) font) I have the following >> in mind: >> The problem is, that many characters share the same codepoint in >> Unicode, but have a different shape (number of strokes and stroke order) >> in the different CJK regions (China, Hong Kong / Macao, Taiwan, Japan, >> Korea). This is one of the main reasons why users in these regions >> prefer different fonts. >> My approach would be to put all character shape variants into a single >> TTC (TrueType Collection) and use a different glyph ID to Unicode >> codepoint mapping for each "virtual font". >> Instead of having 5 separate TTF files, each about 25MB in size, we >> would end up with only one TTC file (about 30 MB in size), which >> produces 5 "virtual fonts". Saves a lot of space. ;) >> >> (If you need more details about this technology, I can elaborate about >> it in a follow up mail) > > This is a key problem, and an interesting proposed solution. Would this > reqire any changes outside of the fonts themselves? No. TTC works already with GTK2 and QT4 >= 4.3. OpenOffice.org is supposed to work, at least it does on SuSE Linux... The debian package seems to have a bug... it cannot use TTC correctly. However, Qt3, GTK1 and other legacy software cannot use TTC. Cheers Arne -- ubuntu-devel mailing list ubuntu-devel@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Attachment:
signature.asc
Description: OpenPGP digital signature