Re: A small question

To: Adam Di Carlo <adam@onshore.com>
Cc: debian-sgml@lists.debian.org
Subject: Re: A small question
From: Michael Sobolev <mss@transas.com>
Date: Fri, 2 Jul 1999 20:52:04 +0400
Message-id: <[🔎] 19990702205204.A29614@transas.com>
In-reply-to: <[🔎] E1105cX-0005t5-00@burrito>; from Adam Di Carlo on Fri, Jul 02, 1999 at 11:52:56AM -0400
References: <19990629234108.A20329@transas.com> <[🔎] oapv2c8dv8.fsf@burrito.fake> <[🔎] 19990701231448.A32359@transas.com> <[🔎] E10zmsJ-0003JQ-00@burrito> <[🔎] 19990702003942.A855@transas.com> <[🔎] E10zp5A-0003V6-00@burrito> <[🔎] 19990702132140.A10799@transas.com> <[🔎] E1105cX-0005t5-00@burrito>

On Fri, Jul 02, 1999 at 11:52:56AM -0400, Adam Di Carlo wrote:
> >You see, the construct \|...\| can be easily cought since it's a special thing
> >(`\' in input will be escaped with \ giving \\ in output).  Well, in case of
> >SDATA-entities, I see how to make use of them.
> 
> I don't see why \|...\| just as easily as ╘.  They are both unique!
> Furthermore, if we can get the charset of the debiandoc char stream
> sorted out, you can hook up *standard*, already written tools to go
> from one char set to another.
Hmm...  It looks I just did not make it clear.  Well, I stated that the output
stream is in unknown character set (that is, CDATA is just copied to output),
this means that the 8-bit code 169 stands for unknown symbol: if we knew that
this is iso-8859-1, then it's (C), if it's koi8-r it's '_|'.  If we find a way
for making sure that output is in UCS-2, UCS-4, UTF* or other encoding that
permit to have a lot of symbols from different languages, then yes, processing
\|...\| is as easy as ╘, but we have a stream of 8-bit characters of unknown
charset, so we have nothing but to create an external logic (like everything
that starts with \ has special meaning) for distinguishing what we need.

> >I am sorry to say that the freshly downloaded and unpacked in a separate
> >directory sgml-data package has ISO* files that define SDATA-entities.
> 
> Yes indeed.  This inconsistency seems to be a bug.
OK.  Should I file it?

> >Well, and now returning to `stock' SGML entities.  copy, and certain other
> >entities (like nbsp, for example) are from ISOnum, while in sgml-data package
> >they are defined in both of them (and they are different, BTW).
> 
> Some overlap may be ok.  ISO defines it -- not Debian!
I beg your pardon?  How this could be?  Well, unfortunately, I do not have a
copy of UNICODE standard.  But I doubt that a <emphasis>standard</emphasis>
could define the same thing in two or more ways: this is not even an ambiguity.
Yes, I agree that we could have two sets of entities: defining UNICODE codes
and system data.  I believe in current situation we have a severe problem: first
included set wins.  That's really bad.

> >As for working out this problem.  There are two possibilities: to make use of
> >SDATA entities in all programs that come with Debian; or to use some Unicode
> >encoding for intermediate/output files.
> 
> I opt for unicode.  Unless there is a standard that the copyright
> circle 'c' glyph needs to be '[copy   ]' and not '[copy ]' nor 
> '[COPY  ]', that is, unless I am given a guidelines by which to 
> distinguish the proper notation from the impostor, I am very hesitant
> to do that.
Adam, I opt for whatever permits us to deal with the problem: what we get is
not what we want.

I believe SDATA just provide a convenient way for dealing with certain symbols.
Please understand that I do not insist on using SDATA-entities only, no, I just
want to see circled c in text of Russian documentation as well as in all other
versions too.

--
Mike

Reply to:

Follow-Ups:
- Re: A small question
  - From: Adam Di Carlo <adam@onshore.com>

References:
- Re: A small question
  - From: Adam Di Carlo <adam@onshore.com>
- Re: A small question
  - From: Michael Sobolev <mss@transas.com>
- Re: A small question
  - From: Adam Di Carlo <adam@onshore.com>
- Re: A small question
  - From: Michael Sobolev <mss@transas.com>
- Re: A small question
  - From: Adam Di Carlo <adam@onshore.com>
- Re: A small question
  - From: Michael Sobolev <mss@transas.com>
- Re: A small question
  - From: Adam Di Carlo <adam@onshore.com>

Prev by Date: Re: A small question
Next by Date: Re: A small question
Previous by thread: Re: A small question
Next by thread: Re: A small question
Index(es):
- Date
- Thread