[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: A small question



>That's why I repeat: if we have ISOLat1 characters to output, these should be
>encoded as 2-byte sequences in case of UTF-8.  Thus, the output files we have
>at the moment <emphasis>cannot</emphasis> be interpreted as UTF-8, since they
>are not.

Hmm. Ok, this might be a problem.  I don't know.  It's up to the
application developer (Ardo) to determine what he wants to do.

>You see, the construct \|...\| can be easily cought since it's a special thing
>(`\' in input will be escaped with \ giving \\ in output).  Well, in case of
>SDATA-entities, I see how to make use of them.

I don't see why \|...\| just as easily as ©.  They are both unique!
Furthermore, if we can get the charset of the debiandoc char stream
sorted out, you can hook up *standard*, already written tools to go
from one char set to another.

>> >One more issues (I just made a more throughly look on entities supplied by
>> >sgml-data.  Why some files provide Unicode equivalents for entities and some
>> >proprietary SDATA?  Is this by design?
>> 
>> There are none that use SDATA AFAIK.  YOu might be mixing up sgml-data
>> with some other packages which put stuff in /usr/lib/sgml/entities.
>
>I am sorry to say that the freshly downloaded and unpacked in a separate
>directory sgml-data package has ISO* files that define SDATA-entities.

Yes indeed.  This inconsistency seems to be a bug.

>Well, and now returning to `stock' SGML entities.  copy, and certain other
>entities (like nbsp, for example) are from ISOnum, while in sgml-data package
>they are defined in both of them (and they are different, BTW).

Some overlap may be ok.  ISO defines it -- not Debian!

>As for working out this problem.  There are two possibilities: to make use of
>SDATA entities in all programs that come with Debian; or to use some Unicode
>encoding for intermediate/output files.

I opt for unicode.  Unless there is a standard that the copyright
circle 'c' glyph needs to be '[copy   ]' and not '[copy ]' nor 
'[COPY  ]', that is, unless I am given a guidelines by which to 
distinguish the proper notation from the impostor, I am very hesitant
to do that.

I would like someone to tell me what should be done, using the
standards out there to back up their arguments.  I am willing to
provide SDATA encodings but not as the *default* unless they are
defined by some standard and it doesn't break the fundamental
jade/dsssl toolchain.

--
.....Adam Di Carlo....adam@onShore.com.....<URL:http://www.onShore.com/>


Reply to: