[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#175064: Debian policy documents should be UTF-8 encoded



Hi,

<disclaimer>I prefer to migrate to UTF-8.</disclaimer> 

But simply changing source file to UTF-8 has some issues which you may
want to consider in advance. (debian-policy being English only manual,
risks are small.)

I think it may be best if this encoding changes are done at the same
time when this documentation is moved to docbook-xml format.  A
functional conversion script already exists.  See below for the detail.

On Thu, Jan 02, 2003 at 05:20:26PM -0500, Colin Walters wrote:
> On Thu, 2003-01-02 at 16:28, Josip Rodin wrote:
> 
> > I'm not seeing that with the copy of policy.txt.gz which I generated myself.
> > Looks like debiandoc2text on Manoj's system used a different, Latin1 locale
> > and replaced ?? for &copy; on my Latin2 system it did no such (foolish) thing.
> > For the record, ?? is a large latin letter S with a hacek/caron. :)
> > 
> > We should probably restrict the build process with LANG=C or something like
> > that.
> 
> Right.  I think it should be sufficient to just add LANG=C before the
> debiandoc2X invocations.

Debiandoc2X basically assumes legacy codings in the source file as it
designed now.  Just add LANG=C before the debiandoc2X invocations does
not do much since it is required to be LANG=C and the script sets locale
by command line option with "-l" and invoke back-end processing commands
with that local as I understand.  (I may be wrong here.)

As I understand, you can use UTF-8 source file for creating plain text
and html files in UTF-8 (I guess html generation needs slightly modified
to indicate generated codes are UTF-8 in its header).  But It will break
PS and PDF generation pretty badly.  (I have been there with my Italian
translator committing UTF-8 file to the document I manage.)

I think Ardo used Latin-1 as code system for back-end at this moment.
Changing this will break many documentation building processes.

In debian-doc project, we have created debiandoc-sgml to docbook-xml
converter.  It is now very usable shape.  If someone makes nice build
script and environment for docbook-xml and spend sometime to hand tune
converted files (including converting it to UTF-8), it should be
smoother transition.  After all SGML used to use legacy encoding system
as the default for the source files while XML's default source encoding
is UTF-8.

URL for conversion script (by Phillipe):
 http://cvs.debian.org/ddp/utils/debiandoc-to-docbook/?cvsroot=debian-doc

Just my thoughts.

Osamu
-- 
~\^o^/~~~ ~\^.^/~~~ ~\^*^/~~~ ~\^_^/~~~ ~\^+^/~~~ ~\^:^/~~~ ~\^v^/~~~ +++++
        Osamu Aoki <osamu@debian.org>   Cupertino CA USA, GPG-key: A8061F32
 .''`.  Debian Reference: post-installation user's guide for non-developers
 : :' : http://qref.sf.net and http://people.debian.org/~osamu
 `. `'  "Our Priorities are Our Users and Free Software" --- Social Contract




Reply to: