>From the debian-devel thread, I want to explore the details of how Debian can achieve a sane translation handling configuration that also includes splitting out the gconv and zoneinfo data into loosely aggregated sets. (gconv from glibc is several Mb of libraries that are not all needed at the same time. zoneinfo data comes from tzdata and is similarly mostly "unused" in a typical installation. Emdebian needs to reduce the download and installed size of both of these data sets by about 70%.) This is based on the tdeb proposal: http://wiki.debian.org/i18n/TranslationDebs but is focused on embedded usage where file space issues are critical - e.g. Emdebian would have problems with the current proposal because localised manpages should not be installed in Emdebian (which has no manpages). Emdebian currently splits every translation into a dedicated package - if a source package has more than one .mo file per LC_MESSAGES/ directory, all are included in the same package: $ dpkg -c /opt/emdebian/trunk/a/apt/trunk/apt-locale-de_0.7.8em1_all.deb drwxr-xr-x root/root 0 2007-10-31 17:45 ./ drwxr-xr-x root/root 0 2007-10-31 17:45 ./usr/ drwxr-xr-x root/root 0 2007-10-31 17:45 ./usr/share/ drwxr-xr-x root/root 0 2007-10-31 17:45 ./usr/share/locale/ drwxr-xr-x root/root 0 2007-10-31 17:45 ./usr/share/locale/de/ drwxr-xr-x root/root 0 2007-10-31 17:45 ./usr/share/locale/de/LC_MESSAGES/ -rwxr-xr-x root/root 31950 2007-10-31 17:45 ./usr/share/locale/de/LC_MESSAGES/apt.mo -rwxr-xr-x root/root 6022 2007-10-31 17:45 ./usr/share/locale/de/LC_MESSAGES/libapt-inst1.1.mo -rwxr-xr-x root/root 24556 2007-10-31 17:45 ./usr/share/locale/de/LC_MESSAGES/libapt-pkg4.6.mo This is done with a tool called 'emlocale' which roughly equates to the (unwritten) dpkg-gentdebsrc from the tdeb proposal. Emdebian also needs a way of splitting the gconv files out of glibc so that only the necessary gconv files are packaged and installed - depending on the configuration specified in emdebian-tools and depending on user setup. Similarly with the zoneinfo files from tzdata. Together, the translations, the gconv support and the tzdata support need to form a set of packages that can be omitted from certain builds, added in their entirety for other builds and offered in various combinations for users who need them. The scalable way of doing this is for a secondary archive structure that is not part of the main dpkg or apt cache data. The archive would need to be partitioned so that each device would simply add a source for the support that the user needs, e.g. one source per continent containing packages for all gconv and zoneinfo support and translations. Adding support for additional locales would mean adding new source lists - this is needed to allow embedded devices to have a small cache of this secondary data. (Otherwise there is little or no advantage over simply including all these new packages in the main apt cache because apt will simply collate them into one big list anyway.) The tricky part is deciding which translation goes where because Emdebian does need to limit the size of this secondary apt/dpkg cache yet there is no absolute mapping between geography and languages spoken. A possible method is to stick to the geographical and if users in North America want languages from Europe, that source can simply be added. At least that way, that particular user does not have cache data or locale packages for Oceania, Asia or Africa which cuts the size of the cache data by 75%. Note that, unlike Raphael's suggestion on the tdeb page, Emdebian DOES need one package for one translation. Wasted space is *not* an option when that space is wasted again and again for each package installed. Collating the gconv and tzdata is acceptable because it is only installed once. Collating the translations is not. Emdebian is primarily concerned with file sizes - package sizes and cache data sizes - because storage space is very expensive for Emdebian, unlike Debian itself. The user would be asked which timezone to use (as now) as well as which language to use (as now). At a later date, the user could choose to add new timezone and new language support. In Emdebian, the option would also exist to have no timezone, no locale and no translation support (e.g. for devices that do not produce user output). In effect, dpkg-reconfigure locales would simply involve installing the necessary support prior to configuring it. There is already support for separate repositories for package description translations, what Emdebian needs is a development of the tdebs proposal that includes splitting out the gconv and tzdata files alongside the translations themselves so that selecting a locale and timezone installs the necessary packages prior to configuration, rather than forcing all users to have all data on all systems, whether configured or not. Depending on how this is done, the gconv data and the tzdata zoneinfo data might not need to be in the translation repository itself, just not installed by default. I'm looking for ideas and help setting this up. I have the time and inclination to get this sorted out and it is long overdue. If Emdebian is to get off the ground, this is just one of those issues that *must* be solved. So apologies for the really long mail, but here's a summary of how Emdebian needs to handle tdebs: 1. Users must be able to download and install pt without pt_BR 2. Users must be able to fallback to pt if pt_BR is the preference but absent for a specific package. 3. No translation files are installed without explicit user intervention or device configuration. 4. gconv and zoneinfo data split into continental groups 5. Nothing except the .mo file(s) in tdebs for Emdebian (and I'd rather not have to rebuild *all* your tdebs to do that). 6. Whatever processes need to be run on the *user device* to achieve all this can only use C or C++. Perl is *not* part of Emdebian. There is no python support or any other interpreted language support. Note that simply filtering out the localised manpages etc. is not ideal because the larger tdeb still has to be downloaded and unpacked and there simply might not be room to do that. My target device is likely to have <7Mb free at any time. -- Neil Williams ============= http://www.data-freedom.org/ http://www.nosoftwarepatents.com/ http://www.linux.codehelp.co.uk/
Attachment:
pgprnWTB9vaFB.pgp
Description: PGP signature