Re: A small question
On Fri, Jul 02, 1999 at 11:52:56AM -0400, Adam Di Carlo wrote:
> >You see, the construct \|...\| can be easily cought since it's a special thing
> >(`\' in input will be escaped with \ giving \\ in output). Well, in case of
> >SDATA-entities, I see how to make use of them.
>
> I don't see why \|...\| just as easily as ╘. They are both unique!
> Furthermore, if we can get the charset of the debiandoc char stream
> sorted out, you can hook up *standard*, already written tools to go
> from one char set to another.
Hmm... It looks I just did not make it clear. Well, I stated that the output
stream is in unknown character set (that is, CDATA is just copied to output),
this means that the 8-bit code 169 stands for unknown symbol: if we knew that
this is iso-8859-1, then it's (C), if it's koi8-r it's '_|'. If we find a way
for making sure that output is in UCS-2, UCS-4, UTF* or other encoding that
permit to have a lot of symbols from different languages, then yes, processing
\|...\| is as easy as ╘, but we have a stream of 8-bit characters of unknown
charset, so we have nothing but to create an external logic (like everything
that starts with \ has special meaning) for distinguishing what we need.
> >I am sorry to say that the freshly downloaded and unpacked in a separate
> >directory sgml-data package has ISO* files that define SDATA-entities.
>
> Yes indeed. This inconsistency seems to be a bug.
OK. Should I file it?
> >Well, and now returning to `stock' SGML entities. copy, and certain other
> >entities (like nbsp, for example) are from ISOnum, while in sgml-data package
> >they are defined in both of them (and they are different, BTW).
>
> Some overlap may be ok. ISO defines it -- not Debian!
I beg your pardon? How this could be? Well, unfortunately, I do not have a
copy of UNICODE standard. But I doubt that a <emphasis>standard</emphasis>
could define the same thing in two or more ways: this is not even an ambiguity.
Yes, I agree that we could have two sets of entities: defining UNICODE codes
and system data. I believe in current situation we have a severe problem: first
included set wins. That's really bad.
> >As for working out this problem. There are two possibilities: to make use of
> >SDATA entities in all programs that come with Debian; or to use some Unicode
> >encoding for intermediate/output files.
>
> I opt for unicode. Unless there is a standard that the copyright
> circle 'c' glyph needs to be '[copy ]' and not '[copy ]' nor
> '[COPY ]', that is, unless I am given a guidelines by which to
> distinguish the proper notation from the impostor, I am very hesitant
> to do that.
Adam, I opt for whatever permits us to deal with the problem: what we get is
not what we want.
I believe SDATA just provide a convenient way for dealing with certain symbols.
Please understand that I do not insist on using SDATA-entities only, no, I just
want to see circled c in text of Russian documentation as well as in all other
versions too.
--
Mike
Reply to: