[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

locales-all: a proposal



There has occasionally been talk of a "locales-all" package, which would
contain compiled binary forms of all locales. This would use up a bit
more bandwidth and disk space in order to save CPU time: generating the
locales can take a long time, if you need many and don't have a fast
processor.

I've made a preliminary patch to implement locales-all, if only to get
some numbers. The existing locales .deb is about 3.9 MB, resulting in
about 10.5 MB installed size, plus the generated files (probably at
most a few hundred kilobytes). My locales-all .deb is about 14 MB, with
an installed size of 59 MB. Thus, locales-all is about 10 MB bigger to
download, and 48 MB bigger installed.

Generating the three locales needed on my mail/web/etc server, which is
used by several people, takes about 15 seconds. Locales with more
complicated character sets, especially many Asian languages, will make
this rather slower still.

Is the time savings on installation signficant enough that it is
worthwhile to make builds slower, and spend make mirrors that much
bigger? I don't know. I guess it depends on how much time you spend on
waiting for locale-gen to run. I don't mind waiting the 15 seconds it
takes on my server every time I upgrade libc6, but I run stable, so it
happens very rarely.

When implementing this, my first step was to clean up the locale-gen
script a bit. The current version is a bit messy, having unnecessary
line continuation escapes and compressing things into single lines in an
obfuscatory way. Attaches is locale-gen.diff and locale-gen.clean, the
former being the diff and the latter the entire version (in case the
diff doesn't apply cleanly anymore). The functionality is exactly the
same, but I claim the cleaned up version is easier to read, and thus to
maintain in the future. If nothing else, I would like to see this
included in the package.

Also attached is the preliminary patch for locales-all. I started by
enhancing locale-gen further, adding features to it to make it possible
to override the location for /etc/locale.gen, and directing output to
somewhere else than /usr/lib/locale, and also for reading data files
from a custom location. This was necessary so that I could easily build
the locale-archive file for the locales-all package. Which I then did.

This locales-all package replaces the locales package. It might be
better to have locales-all just depend on locales (identical version, of
course), so as not to duplicate the data files into two .debs.

I have tested this lightly. Any comments are welcome. I didn't yet want
to file a wishlist bug against libc6 before it's decided that
locales-all is a good idea in the first place.

--- locale-gen.from-svn	2005-09-25 15:53:33.975828184 +0000
+++ locale-gen	2005-09-25 15:53:33.975828184 +0000
@@ -1,22 +1,31 @@
 #!/bin/sh
+#
+# Create or update /usr/lib/locale/locale-archive with locale data for
+# the locales specified in /etc/locale.gen.
+#
+# This script has been written for the Debian project and is under
+# the GNU Lesser General Public License, version 2.
 
 set -e
 
 LOCALEGEN=/etc/locale.gen
 LOCALES=/usr/share/i18n/locales
-if [ -n "$POSIXLY_CORRECT" ]; then
-  unset POSIXLY_CORRECT
-fi
 
+# localedef works differently depending on whether POSIXLY_CORRECT is
+# set or not. We want the behavior when it is not set so we unset it.
+unset POSIXLY_CORRECT
 
+# Let's not do anything if $LOCALEGEN doesn't exist or is empty.
 [ -f $LOCALEGEN -a -s $LOCALEGEN ] || exit 0;
 
 # Remove all old locale dir and locale-archive before generating new
 # locale data.
 rm -rf /usr/lib/locale/* || true
 
+# Make sure new files are created with the right permissions.
 umask 022
 
+# Is an entry in $LOCALEGEN correct: does it contain two fields?
 is_entry_ok() {
   if [ -n "$locale" -a -n "$charset" ] ; then
     true
@@ -27,16 +36,25 @@
 }
 
 echo "Generating locales (this might take a while)..."
-while read locale charset; do \
-	case $locale in \#*) continue;; "") continue;; esac; \
-	is_entry_ok || continue
-	echo -n "  `echo $locale | sed 's/\([^.\@]*\).*/\1/'`"; \
-	echo -n ".$charset"; \
-	echo -n `echo $locale | sed 's/\([^\@]*\)\(\@.*\)*/\2/'`; \
-	echo -n '...'; \
-        if [ -f $LOCALES/$locale ]; then input=$locale; else \
-        input=`echo $locale | sed 's/\([^.]*\)[^@]*\(.*\)/\1\2/'`; fi; \
-	localedef -i $input -c -f $charset -A /usr/share/locale/locale.alias $locale; \
-	echo ' done'; \
+while read locale charset
+do
+  case $locale in
+    \#*) continue ;;
+    "") continue ;;
+  esac
+  is_entry_ok || continue
+  echo -n "  `echo $locale | sed 's/\([^.\@]*\).*/\1/'`"
+  echo -n ".$charset"
+  echo -n `echo $locale | sed 's/\([^\@]*\)\(\@.*\)*/\2/'`
+  echo -n '...'
+  if [ -f $LOCALES/$locale ]
+  then
+    input=$locale
+  else
+    input=`echo $locale | sed 's/\([^.]*\)[^@]*\(.*\)/\1\2/'`
+  fi
+  localedef -i $input -c -f $charset -A /usr/share/locale/locale.alias $locale\
+	|| true
+  echo ' done'
 done < $LOCALEGEN
 echo "Generation complete."

Attachment: locale-gen.clean
Description: application/shellscript

diff -ru glibc-2.3.5-faster/debian/control glibc-2.3.5-locales-all/debian/control
--- glibc-2.3.5-faster/debian/control	2005-09-24 14:12:43.000000000 +0000
+++ glibc-2.3.5-locales-all/debian/control	2005-09-25 11:21:12.000000000 +0000
@@ -37,6 +37,24 @@
  savings over how this package used to be, where all locales were generated
  by default. This created a package that unpacked to an excess of 30 megs.
 
+Package: locales-all
+Architecture: all
+Section: base
+Priority: extra
+Provides: i18ndata, locales
+Depends: ${locale:Depends}, debconf (>= 0.2.26)
+Conflicts: locales, localebin, wg15-locale, i18ndata, locale-ja, locale-ko, locale-vi, locale-zh
+Replaces: locales, localebin, wg15-locale, libc6-bin, i18ndata, glibc2, locale-ja, locale-ko, locale-vi, locale-zh
+Description: GNU C Library: National Language (locale) data [support] (full)
+ Machine-readable data files, shared objects and programs used by the
+ C library for localization (l10n) and internationalization (i18n) support.
+ .
+ This package contains the libc.mo i18n files, plus all compiled (ready-to-use)
+ versions of all locale definitions. It is big (tens of megabytes), but
+ this can be faster than generating many locales on machines that need
+ support users from many backgrounds. See the package called "locales"
+ for a smaller version that creates only the versions that are required.
+
 Package: nscd
 Architecture: alpha amd64 arm i386 m68k mips mipsel powerpc sparc ia64 hppa s390 sh3 sh4 sh3eb sh4eb freebsd-i386
 Section: admin
diff -ru glibc-2.3.5-faster/debian/control.in/main glibc-2.3.5-locales-all/debian/control.in/main
--- glibc-2.3.5-faster/debian/control.in/main	2005-09-24 11:32:10.000000000 +0000
+++ glibc-2.3.5-locales-all/debian/control.in/main	2005-09-25 11:20:46.000000000 +0000
@@ -37,6 +37,24 @@
  savings over how this package used to be, where all locales were generated
  by default. This created a package that unpacked to an excess of 30 megs.
 
+Package: locales-all
+Architecture: all
+Section: base
+Priority: extra
+Provides: i18ndata, locales
+Depends: ${locale:Depends}, debconf (>= 0.2.26)
+Conflicts: locales, localebin, wg15-locale, i18ndata, locale-ja, locale-ko, locale-vi, locale-zh
+Replaces: locales, localebin, wg15-locale, libc6-bin, i18ndata, glibc2, locale-ja, locale-ko, locale-vi, locale-zh
+Description: GNU C Library: National Language (locale) data [support] (full)
+ Machine-readable data files, shared objects and programs used by the
+ C library for localization (l10n) and internationalization (i18n) support.
+ .
+ This package contains the libc.mo i18n files, plus all compiled (ready-to-use)
+ versions of all locale definitions. It is big (tens of megabytes), but
+ this can be faster than generating many locales on machines that need
+ support users from many backgrounds. See the package called "locales"
+ for a smaller version that creates only the versions that are required.
+
 Package: nscd
 Architecture: @threads_archs@
 Section: admin
diff -ru glibc-2.3.5-faster/debian/local/manpages/locale-gen.8 glibc-2.3.5-locales-all/debian/local/manpages/locale-gen.8
--- glibc-2.3.5-faster/debian/local/manpages/locale-gen.8	2005-09-24 11:32:10.000000000 +0000
+++ glibc-2.3.5-locales-all/debian/local/manpages/locale-gen.8	2005-09-25 11:35:11.000000000 +0000
@@ -1,101 +1,48 @@
-.\" This -*- nroff -*- file has been generated from
-.\" DocBook SGML with docbook-to-man on Debian GNU/Linux.
-...\"
-...\"	transcript compatibility for postscript use.
-...\"
-...\"	synopsis:  .P! <file.ps>
-...\"
-.de P!
-\\&.
-.fl			\" force out current output buffer
-\\!%PB
-\\!/showpage{}def
-...\" the following is from Ken Flowers -- it prevents dictionary overflows
-\\!/tempdict 200 dict def tempdict begin
-.fl			\" prolog
-.sy cat \\$1\" bring in postscript file
-...\" the following line matches the tempdict above
-\\!end % tempdict %
-\\!PE
-\\!.
-.sp \\$2u	\" move below the image
-..
-.de pF
-.ie     \\*(f1 .ds f1 \\n(.f
-.el .ie \\*(f2 .ds f2 \\n(.f
-.el .ie \\*(f3 .ds f3 \\n(.f
-.el .ie \\*(f4 .ds f4 \\n(.f
-.el .tm ? font overflow
-.ft \\$1
-..
-.de fP
-.ie     !\\*(f4 \{\
-.	ft \\*(f4
-.	ds f4\"
-'	br \}
-.el .ie !\\*(f3 \{\
-.	ft \\*(f3
-.	ds f3\"
-'	br \}
-.el .ie !\\*(f2 \{\
-.	ft \\*(f2
-.	ds f2\"
-'	br \}
-.el .ie !\\*(f1 \{\
-.	ft \\*(f1
-.	ds f1\"
-'	br \}
-.el .tm ? font underflow
-..
-.ds f1\"
-.ds f2\"
-.ds f3\"
-.ds f4\"
-'\" t 
-.ta 8n 16n 24n 32n 40n 48n 56n 64n 72n  
-.TH "LOCALE-GEN" "8" 
-.SH "NAME" 
-locale-gen \(em generates localisation files from templates 
-.SH "SYNOPSIS" 
-.PP 
-\fBlocale-gen\fP 
-.SH "DESCRIPTION" 
-.PP 
-This manual page documents briefly the 
-\fBlocale-gen\fP command. 
-.PP 
-By default, the locale package which provides the base support for 
-localisation of libc-based programs does not contain usable localisation 
-files for every supported language. This limitation has became necessary 
-because of the substantial size of such files and the large number of 
-languages supported by libc. As a result, Debian uses a special 
-mechanism where we prepare the actual localisation files on the target 
-host and distribute only the templates for them. 
-.PP 
-\fBlocale-gen\fP is a program that reads the file 
-\fB/etc/locale.gen\fP and invokes 
-\fBlocaledef\fP for the chosen localisation profiles. 
-Run \fBlocale-gen\fP after you have modified the \fB/etc/locale.gen\fP file. 
- 
- 
-.SH "FILES" 
-.PP 
-\fB/etc/locale.gen\fP 
-.PP 
-The main configuration file, which has a simple format: every 
-line that is not empty and does not begin with a # is treated as a 
-locale definition that is to be built. 
- 
-.SH "SEE ALSO" 
-.PP 
-localedef (1), locale (1), locale.gen (5). 
-.SH "AUTHOR" 
-.PP 
-This manual page was written by Eduard Bloch <blade@debian.org> for 
-the \fBDebian GNU/Linux\fP system (but may be used by others).  Permission is 
-granted to copy, distribute and/or modify this document under 
-the terms of the GNU Free Documentation 
-License, Version 1.1 or any later version published by the Free 
-Software Foundation; with no Invariant Sections, no Front-Cover 
-Texts and no Back-Cover Texts. 
-...\" created by instant / docbook-to-man, Sat 02 Mar 2002, 16:43 
+.TH LOCALE-GEN 1 "Sep 24, 2005"
+.SH NAME
+locale-gen \- generate locale data files
+.SH SYNOPSIS
+.B locale-gen
+.RB [ \-hq ]
+.RB [ \-d
+.IR dirname ]
+.RB [ \-g
+.IR filename ]
+.RB [ \-l
+.IR dirname ]
+.SH DESCRIPTION
+The
+.B locale-gen
+program generates binary locale data files from textual description files.
+The binary files are fast to load, but can take a lot of space.
+For this reason, the Debian 
+.B locales
+package does not any binary locale data files, only the textual ones.
+.SH OPTIONS
+.TP
+.B \-h
+Print a short help text.
+.TP
+.B \-q
+Quiet operation: do not report progress.
+.TP
+.BI \-d " dirname"
+Place output files into
+.I dirname
+instead of 
+.IR /usr/lib/locale .
+.TP
+.BI \-g " filename"
+Read list of locales that are to be generated from
+.I filename
+instead of 
+.IR /etc/locale.gen .
+.TP
+.BI \-l " dirname"
+Look for textual locale specifications in 'dir' instead of
+.IR  /usr/share/i18n/locales .
+.SH "SEE ALSO"
+.BR localedef "(1), " locale "(5), " locale "(7), " locale (1)
+.SH AUTHOR
+The program and this manual page have been written and modified by
+a number of people for the Debian project.
diff -ru glibc-2.3.5-faster/debian/local/usr_sbin/locale-gen glibc-2.3.5-locales-all/debian/local/usr_sbin/locale-gen
--- glibc-2.3.5-faster/debian/local/usr_sbin/locale-gen	2005-09-24 11:32:10.000000000 +0000
+++ glibc-2.3.5-locales-all/debian/local/usr_sbin/locale-gen	2005-09-25 12:40:33.621309664 +0000
@@ -1,22 +1,81 @@
 #!/bin/sh
+#
+# Create or update /usr/lib/locale/locale-archive with locale data for
+# the locales specified in /etc/locale.gen.
+#
+# This script has been written for the Debian project and is under
+# the GNU Lesser General Public License, version 2.
 
 set -e
 
+# Variables that may modified by options.
 LOCALEGEN=/etc/locale.gen
 LOCALES=/usr/share/i18n/locales
-if [ -n "$POSIXLY_CORRECT" ]; then
-  unset POSIXLY_CORRECT
-fi
+OUTPUT=/usr/lib/locale
+VERBOSE=1
 
+# localedef works differently depending on whether POSIXLY_CORRECT is
+# set or not. We want the behavior when it is not set so we unset it.
+unset POSIXLY_CORRECT
+
+verbose() {
+  if [ "$VERBOSE" = 1 ]
+  then
+    echo "$@"
+  fi
+}
 
-[ -f $LOCALEGEN -a -s $LOCALEGEN ] || exit 0;
+print_help() {
+  cat <<EOF 1>&2
+Usage: $0 [options]
+Generate locale binary files (by default, /usr/lib/locale/locale-archive
+or other files in that directory) for the locales listed in /etc/locale.gen.
+
+Options are:
+
+  -h		This help.
+  -q		Quiet execution: don't report progress.
+
+  -d dir	Use 'dir' instead of $OUTPUT for output.
+  -g filename	Use 'filename' for input instead of $LOCALEGEN.
+  -l dir	Look for locale specifications in 'dir' instead of
+		$LOCALES.
 
-# Remove all old locale dir and locale-archive before generating new
-# locale data.
-rm -rf /usr/lib/locale/* || true
+EOF
+}
 
+# Parse the command line.
+TEMP=$(getopt d:g:hl:q "$@")
+if [ $? != 0 ]
+then
+    exit 1
+fi
+eval set -- "$TEMP"
+
+while true
+do
+  case "$1" in
+  --) shift 1; break ;;
+  -h) shift 1; print_help; exit 0 ;;
+  -q) shift 1; VERBOSE=0 ;;
+  -d) OUTPUT="$2"; shift 2 ;;
+  -g) LOCALEGEN="$2"; shift 2 ;;
+  -l) LOCALES="$2"; shift 2 ;;
+  *) echo "Unknown parameter $1" 1>&2; exit 1 ;;
+  esac
+done
+
+# Let's not do anything if $LOCALEGEN doesn't exist or is empty.
+if [ ! -f "$LOCALEGEN" -o ! -s "$LOCALEGEN" ]
+then
+  verbose "$LOCALEGEN does not exist or is empty, so there's nothing to do."
+  exit 0;
+fi
+
+# Make sure new files are created with the right permissions.
 umask 022
 
+# Is an entry in $LOCALEGEN correct: does it contain two fields?
 is_entry_ok() {
   if [ -n "$locale" -a -n "$charset" ] ; then
     true
@@ -26,17 +85,42 @@
   fi
 }
 
-echo "Generating locales (this might take a while)..."
-while read locale charset; do \
-	case $locale in \#*) continue;; "") continue;; esac; \
-	is_entry_ok || continue
-	echo -n "  `echo $locale | sed 's/\([^.\@]*\).*/\1/'`"; \
-	echo -n ".$charset"; \
-	echo -n `echo $locale | sed 's/\([^\@]*\)\(\@.*\)*/\2/'`; \
-	echo -n '...'; \
-        if [ -f $LOCALES/$locale ]; then input=$locale; else \
-        input=`echo $locale | sed 's/\([^.]*\)[^@]*\(.*\)/\1\2/'`; fi; \
-	localedef -i $input -c -f $charset -A /usr/share/locale/locale.alias $locale || :; \
-	echo ' done'; \
+# Create a temporary directory for storing the output while it is being
+# generated. This way, the system continues to be fully operation until
+# the (very brief) moment when we move the output to its final location.
+TEMP=$(mktemp -d)
+if [ "$?" != 0 ]
+then
+  exit 1
+fi
+mkdir -p "$TEMP/usr/lib/locale"
+
+verbose "Generating locales (this might take a while)..."
+while read locale charset
+do
+  case $locale in
+    \#*) continue ;;
+    "") continue ;;
+  esac
+  is_entry_ok || continue
+  verbose -n "  `echo $locale | sed 's/\([^.\@]*\).*/\1/'`"
+  verbose -n ".$charset"
+  verbose -n `echo $locale | sed 's/\([^\@]*\)\(\@.*\)*/\2/'`
+  verbose -n '...'
+  if [ -f $LOCALES/$locale ]
+  then
+    input=$locale
+  else
+    input=`echo $locale | sed 's/\([^.]*\)[^@]*\(.*\)/\1\2/'`
+  fi
+  localedef --prefix "$TEMP" -i $input -c -f $charset \
+	    -A /usr/share/locale/locale.alias $locale \
+	|| true
+  verbose ' done'
 done < $LOCALEGEN
-echo "Generation complete."
+verbose "Generation complete."
+
+# Move files from the temporary directory to the final location.
+rm -rf "$OUTPUT"/*
+mv "$TEMP"/usr/lib/locale/* "$OUTPUT"
+rm -rf "$TEMP"
diff -ru glibc-2.3.5-faster/debian/rules glibc-2.3.5-locales-all/debian/rules
--- glibc-2.3.5-faster/debian/rules	2005-09-24 13:45:28.000000000 +0000
+++ glibc-2.3.5-locales-all/debian/rules	2005-09-25 12:33:14.000000000 +0000
@@ -118,7 +118,7 @@
 curpass = $(filter-out %_,$(subst _,_ ,$@))
 
 DEB_ARCH_REGULAR_PACKAGES = $(libc) # $(libc)-dev $(libc)-dbg $(libc)-prof $(libc)-pic
-DEB_INDEP_REGULAR_PACKAGES = glibc-doc locales
+DEB_INDEP_REGULAR_PACKAGES = glibc-doc locales locales-all
 DEB_UDEB_PACKAGES = # $(libc)-udeb libnss-dns-udeb libnss-files-udeb
 
 # Generic kernel version check
diff -ru glibc-2.3.5-faster/debian/rules.d/debhelper.mk glibc-2.3.5-locales-all/debian/rules.d/debhelper.mk
--- glibc-2.3.5-faster/debian/rules.d/debhelper.mk	2005-09-24 11:32:10.000000000 +0000
+++ glibc-2.3.5-locales-all/debian/rules.d/debhelper.mk	2005-09-25 13:06:28.000000000 +0000
@@ -40,6 +40,13 @@
 	install --mode=0644 $(DEB_SRCDIR)/localedata/ChangeLog debian/$(curpass)/usr/share/doc/$(curpass)/changelog
 endef
 
+define locales-all_extra_debhelper_pkg_install
+	sh debian/local/usr_sbin/locale-gen \
+		-d debian/$(curpass)/usr/lib/locale \
+		-g debian/tmp-libc/usr/share/i18n/SUPPORTED \
+		-l build-tree/glibc-*/localedata
+endef
+
 define glibc-doc_extra_debhelper_pkg_install
 	install --mode=0644 $(DEB_SRCDIR)/ChangeLog debian/$(curpass)/usr/share/doc/$(curpass)/changelog
 	install --mode=0644 $(DEB_SRCDIR)/linuxthreads/FAQ.html debian/$(curpass)/usr/share/doc/$(curpass)/FAQ.linuxthreads.html

Reply to: