[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: location of UnicodeData.txt

starner@okstate.edu <starner@okstate.edu>:

> The problem is, every character in Unicode, all 70,000 of them, has a
> distinct set of properties. UnicodeData.txt is basically a listing of
> those properties. If it is a copyrightable work, I see no way for a text
> processing program to conform to Unicode without using a derivative of 
> that copyrighted work. Likewise, I'd bet that file or some derivative of
> it is embedded in both Perl and Python - you can't reasonably handle 
> Unicode characters without it.

This is like the word list (spelling dictionary) discussion from a few
weeks ago.

Intuitively, I would guess you could make a program conform to Unicode
without using a derivative of UnicodeData.txt. Copyright applies to
the expression of the facts, not the facts themselves, so you can
still write your own, original description of how Unicode characters
are handled.

> We could always pony up the $12,000 (or $1200 for an associate membership) 
> and become a member of Unicode and complain about this from the inside. 

An alternative approach would be set up an alternative organisation
defining an alternative universal character set that is virtually
identical to Unicode, but the documents describing it are new,
original works with free licences. Don't worry too much if there are a
few accidental differences. You could then approach the Unicode
Consortium and say that you are keen to cooperative and keep the two
standards aligned. That might be a good negotiating position because
the Unicode Consortium people really do want there to be a single,
universal character set, I think.

(At one point Unicode and UCS were two separate standards that were
deliberately maintained so as to be identical. I don't know whether
that is still the case, but there might already be an alternative to


Reply to: