Bug#175064: Debian policy documents should be UTF-8 encoded
Hi,
<disclaimer>I prefer to migrate to UTF-8.</disclaimer>
But simply changing source file to UTF-8 has some issues which you may
want to consider in advance. (debian-policy being English only manual,
risks are small.)
I think it may be best if this encoding changes are done at the same
time when this documentation is moved to docbook-xml format. A
functional conversion script already exists. See below for the detail.
On Thu, Jan 02, 2003 at 05:20:26PM -0500, Colin Walters wrote:
> On Thu, 2003-01-02 at 16:28, Josip Rodin wrote:
>
> > I'm not seeing that with the copy of policy.txt.gz which I generated myself.
> > Looks like debiandoc2text on Manoj's system used a different, Latin1 locale
> > and replaced ?? for © on my Latin2 system it did no such (foolish) thing.
> > For the record, ?? is a large latin letter S with a hacek/caron. :)
> >
> > We should probably restrict the build process with LANG=C or something like
> > that.
>
> Right. I think it should be sufficient to just add LANG=C before the
> debiandoc2X invocations.
Debiandoc2X basically assumes legacy codings in the source file as it
designed now. Just add LANG=C before the debiandoc2X invocations does
not do much since it is required to be LANG=C and the script sets locale
by command line option with "-l" and invoke back-end processing commands
with that local as I understand. (I may be wrong here.)
As I understand, you can use UTF-8 source file for creating plain text
and html files in UTF-8 (I guess html generation needs slightly modified
to indicate generated codes are UTF-8 in its header). But It will break
PS and PDF generation pretty badly. (I have been there with my Italian
translator committing UTF-8 file to the document I manage.)
I think Ardo used Latin-1 as code system for back-end at this moment.
Changing this will break many documentation building processes.
In debian-doc project, we have created debiandoc-sgml to docbook-xml
converter. It is now very usable shape. If someone makes nice build
script and environment for docbook-xml and spend sometime to hand tune
converted files (including converting it to UTF-8), it should be
smoother transition. After all SGML used to use legacy encoding system
as the default for the source files while XML's default source encoding
is UTF-8.
URL for conversion script (by Phillipe):
http://cvs.debian.org/ddp/utils/debiandoc-to-docbook/?cvsroot=debian-doc
Just my thoughts.
Osamu
--
~\^o^/~~~ ~\^.^/~~~ ~\^*^/~~~ ~\^_^/~~~ ~\^+^/~~~ ~\^:^/~~~ ~\^v^/~~~ +++++
Osamu Aoki <osamu@debian.org> Cupertino CA USA, GPG-key: A8061F32
.''`. Debian Reference: post-installation user's guide for non-developers
: :' : http://qref.sf.net and http://people.debian.org/~osamu
`. `' "Our Priorities are Our Users and Free Software" --- Social Contract
Reply to: