[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Re: Google Summer of Code 2009: Debian's Shortlist

> On 2009-04-11, Filipus Klutiero <chealer@gmail.com> wrote:
> > Obey Arthur Liu wrote:
> >> === And the details: ===
> >
> > [...]
> > These descriptions are very short. Assuming these are the abstracts,
> > that's not the students' fault. The abstracts were shortened this year
> > to 500 characters. I struggled to shorten mine to fit this. At this
> > length, it's probably impossible to fit a decent summary of most
> > projects. It would normally make sense to use abstracts for this use
> > case. Maybe Google should be asked to change the limit. Otherwise I'd
> > like to see a custom description which describes a little further. I
> > currently can't comment on all projects presented.
> > That said, this shortlist remains useful, and I thank you for this great
> > jump in transparency.
> Mind to tell us what your proposed project was?  For more transparency?
> Kind regards,
> Philipp Kern

Here is my application, stripped only from the personal section. I will not 
submit this idea anymore. Students should feel free to use this application as 
they wish. The project needs a student with a good understanding of Debian 
package management. Most importantly, a mentor familiar with APT should be 
found. This was a very scarce resource in the last 3 years.

= Project Title =
Improved package management of language packs

= Origins =
There are currently two methods to distribute localized data:
* Bundling localized data for all languages with the application package. The 
main issue with this approach is the size of the package.
* Architecture-independent packages associated with the application packages 
providing localized data. There is typically one package per language. Since 
they 'enable' the language translation for that software when installed they 
are called language packages or language packs. The main issue with this 
approach is that the language package for an application is not installed 

The first method is suboptimal while the second is less usable.

For more information, see 

= Project =
The intention is to optimize distribution of localized data by improving the 
usability of the second method, which should encourage its use and diminish 
the usage of the first method.

Concretely, the first goal is that installation of language packs happens 
automatically. For example, French people should get openoffice.org-l10n-fr 
installed automatically when openoffice.org is installed.

The second goal is to control the effects of growing the number of packages. 
The growth of the Packages file and the number of packages returned by searches 
should be avoided or limited.

= Benefits to Debian =

The Debian groups which would benefit from this project are users and mirror 
providers. Administrators of non-English systems are particularly targeted.

== Direct benefits (improvements to language packs handling) ==
The first direct benefit to users is that administrators will no longer need to 
specifically select the language packages they want to install in order to make 
translations available.
The second direct benefit is that the size of Packages and the number of 
packages matching a search should be reduced. Note that the actual secondary 
direct benefits will depend on how exactly controlling the effects of growing 
the number of packages will be done.

== Indirect benefits (avoiding packages bundling l10n data) ==
The indirect benefit of this project is that increasing the interest in 
language packs should reduce the number of application packages bundling 
localized data. Concretely, the issues of this method will be avoided:

 * Localized data increases (for all architectures) the binary package size.
  * On multi-architecture mirrors, architecture-specific packages increase disk 
usage and bandwidth usage for synchronizations.
  * Increases bandwidth usage for users and uploading mirrors.
  * Increases disk space usage for users. localepurge, considered a hack, 
exists to diminish this issue.
  * Time for installs is increased due to getting and unpacking a larger .deb.
 * Localized data is in the same binary package and therefore has to be built 
from the same source package as the application.
  * Localized data can not be handled by different maintainers.
  * Translation updates can not be made independently from the application 
binary package and could cause a regression in the application package. It is 
risky to do translation updates during a freeze.
  * A translation update means that the application binary package needs to be 
rebuilt. This causes larger updates (mostly more bandwidth usage) and 
increased buildd usage, so maintainers tend to wait for a new software release 
before providing the translation updates. The delay for translator's work to 
reach users tends to increase (e.g. debconf updates sitting in the BTS).

Work from maintainers will be needed to obtain these indirect benefits. 
Nevertheless, I expect the indirect benefits to be greater than the direct 

= Deliverables =

*APT installing language packs for given language(s) automatically
*Means to control the effects of growing the number of packages
*Depending on developer feedback, improved development tools for building 
language packs
*Advice to developers about when language packs should be used and tips to do 

= Project Details =

Installing language packages automatically should only require changes to APT 
and Policy.
Means to control the effects of growing the number of packages need more 
discussion. The current proposition to have new components would require 
changes to APT and archive maintenance tools. Changes to APT front-ends and 
tools may be another way.

== Implementation of "APT installing language packs for given language(s) 
automatically" ==
Currently the main idea to implement this is based on a desired language(s) 
setting. It has basically 2 steps:
*Map the application package to the language package(s) using the desired 
language(s) setting.
*Install the language package(s) when installing the application package
I do not have a clear idea of how to implement the first step for now, mainly 
due to the "dialects". For example, foo-l10n-fr-ca should be used if a system 
has "fr_CA" as a desired languages setting, but should also be used for a "fr" 
desired languages setting if there is no foo-l10n-fr nor any foo-l10n-fr-fr.
The second step should be easy. The desired language(s) setting should be an 
apt configuration option.
For now I think it will be possible to map the application package to the 
language pack simply using package names, but it would be possible and perhaps 
cleaner to use new control fields (e.g. Provides-l10n: iceweasel, L10n-
language: zh).
So this could be done with a change to policy mentioning how language packages 
should be named or documenting the new control fields. Changes to APT will of 
course be needed. Installing the new apt version for the first time should 
preset the desired language(s) setting to, for example, debian-
installer/language. Changes to archive maintenance tools may be needed for new 
control fields.

= Project Schedule =
I can work on this project during the entire summer. I expect the main part of 
this project to take about 7 weeks.

Week 1
Determine the implementation and request comments.
Week 2
Integrate feedback, perfect the proposition and review the project schedule 
for the remaining time.
Week 3 and 4
Modify APT and Policy to allow automatic installation of language packs.
Week 5 to 7
Provide means to control the effects of growing the number of packages. For now 
I believe this should consist in modifications to APT and archive maintenance 

The remaining time should be plenty to deal with unexpected issues, bugs or 
over-optimistic schedule items. If these do not take all the time, I will 
improve developer tools. I may also produce patches for packages to start 
adapting their language packs to the new specification. At this point, if it 
was not done before, I will write the documentation for developers. If there 
is still time, I may produce patches to modularize application packages 
bundling l10n data.

The 2 first weeks will also be used to familiarize myself with software that 
will require changes, that is at least APT and APT front-ends.

= Summer commitments =
During summer 2009, I will either graduate or work on this project, depending 
on whether this offer is accepted. Otherwise, I have no commitment or plan for 
the summer.

= Plans for Debian =
I must confess I'm already involved with Debian, since at least 2005. I mainly 
provide support and work on quality and a bit on documentation. The temptation 
to maintain packages has been tempting, but never quite enough to get started. 
Since the end of 2006, this temptation was tempered by serious issues with the 
BTS that make it impractical for me to start maintaining any package (long 
story, I'm still working on fixing this). I do not have real plans for Debian 
after the summer (at least, none I'd expect my schedule to allow). It is 
nevertheless possible, depending on my scheduele, the progress on my BTS 
issues and how I enjoy working on APT, that I give in to the temptation of 
working on APT or an APT front-end (I use Synaptic, but I'd prefer a good 
Qt/KDE front-end, which is not yet in sight).

= About this document =
This application is a small update of one sent in 2007 and 2008. Nothing was 
done in this area since 2007, so the project is almost identical. In March 
2009, Neil Williams submitted a DEP draft about "Tdeb"-s, which targets some 
of the issues covered by this project.

== Credits ==
Javier Fernández-Sanguino created 
http://wiki.debian.org/i18n/TranslationDataDistribution from which some of 
this proposal's content comes from. Aigars Mahinovs and Eddy Petrișor wrote 
http://wiki.debian.org/i18n/TranslationDebs which inspired the implementation 
suggested/drafted here.

Thanks to Steve McIntyre and Erich Schubert for respectively backing up and 
sending me back the 2007 version of this application, which I had lost.

Reply to: