Re: location of UnicodeData.txt
> But they clearly do not want you to modify anything, including
> character name! Character name is a searchable field, which some
> applications may need.
It's an English field, for which there is a canonical translation
for French, and there should be translation for other languages.
> The only overlap with any previous character coding is the first 127
> characters (ASCII).
Nope. There's massive overlap with previous character codings on
all sorts of levels. The first 256 characters are Latin-1; the
Greek block is a superset of ISO-8859-7 (that is, the characters
are in the same order, but some of the gaps have been filled in),
as is Cyrillic and Arabic for their respective 8859 standard. All
the Indian blocks are weird echos of ISCII. The basic CJK block is
the ideographs from the preexisting Chinese, Japanese and Korean
standards, sorted by the order of traditional dictionaries like the
KangXi.
> If a system simply declared a section of data to be
> UniCode data, and made no attempt to comprehend the contents, it
> probably would not need to have access to the contents of Unicode.txt.
Just like if a system simply declared a section of data to be
code complaint to Fortran-2026, and if it made no attempt to
comprehend it, it wouldn't need access to the contents of that
standard. A text-processing program that needs to display data is
going to need the contents of UnicodeData for BiDi. A proper
cut program should use UnicodeData, so it doesn't seperate a
character from a subsequent combining character. A spell program
is going to need the data to know which characters end words.
Anything that handles text in a way more complex then cat will
access to this data.
______________________________________________________________________
Do you want a free e-mail for life ? Get it at http://www.personal.ro/
Reply to: