[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: location of UnicodeData.txt

On Wed, Dec 11, 2002 at 06:21:01PM -0800, Thomas Bushnell, BSG wrote:
> Jim Penny <jpenny@universal-fasteners.com> writes:
> > On Tue, Dec 10, 2002 at 06:52:21PM -0800, Thomas Bushnell, BSG wrote:
> > > Jim Penny <jpenny@universal-fasteners.com> writes:
> > > 
> > > > I would venture to guess that even with a perfect oracle, it would be
> > > > essentially imposible to reverse engineer the Unicode data files, much
> > > > less the ancillary algorithms.  That is, a 32 bit search space with at
> > > > least 36 properties to be discovered per data point is whopping big.
> > > 
> > > That's irrelevant.
> > > 
> > 
> > No, it is not irrelevant.  There were two questions:  1) Does it have
> > enough originality to be copyrightable?  2)  Could it be reverse
> > engineered (thereby avoiding the copyright)?  My answers are, 1) Yes, 
> > and 2) No, resp.
> 1) Doesn't matter, because we can implement Unicode in free programs
>    whether it's copyrightable or not.
> 2) Doesn't matter, same reason.
> Thomas

1)  There is no doubt that Unicode can be implemented in free programs.
The COPYRIGHTED files' license specifically allows this, and places no
contraints on the eventual license of the software being produced.
However, the Unicde Consortium license happens to fail DSFG freedom.  

2)  Copyright law comes into question only in two ways.  There is a
certain faction that claims that unicodeData.txt and auxilliary files
cannot be copyrighted, "as a mere collation".  I am stating that this
claim will not pass factual muster, that there is much that is
non-obvious in these tables, and that many choices have been made.  This
is particularly true in CJK.

3)  The second way that copyright law is being questioned is whether it
can be sidestepped by "reverse engineering" the tables.  The above is a
reason for doubting that this is possible.

There is no doubt that any program claiming to implement the unicode
standard and which does something as simple as sorting, must either 
use unicodeData.txt directly, or must use data derived from (in the 
legal sense), this file.

Now, let us consider five scenarios:

A)  Upstream produces some set of tables that it claims are compatible
with unicodeData.txt, but not derived from them.

B)  Upstream has a tool that mechanically takes unicodeData.txt,
extracts, and reformats data for use in its program.  This tool is run
by upstream, and is not rerun at build time.

C)  Upstream has a tool that mechanically takes unicodeData.txt,
extracts, and reformats data for use in its program.  This tool is run
as a part of every build.

D)  Upstream distributes unicodeData.txt as part of its program.

E)  The unicodeData.txt file is extracted as part of the installation
program or at runtime.

I think that I know programs that are currently "part of" Debian that use
each of the strategies A-D.  

I claim that C & D are clearly situations
in which are "free packages which require contrib, non-free packages or
packages which are not in our archive at all for compilation or
execution".  I think that if the question was ever brought up before a
court of law, B and C would be held to be equivalent; that is, that if
under the DSFG "contract", C places a package in contrib; then B also
would.  This is NOT a question of copyright law, but only of the DSFG

Programs using strategy A are probably DSFG free, assuming that there is
no evidence of fraud.  But, given the complexity and magnitude of the
unicode standard, and that the totality of facts embodied in
unicodeData.txt is not available in the rest of the unicode standard:
if an interested party ever brought this before a judge then
copyright infringement would be proved.

I know of no programs using strategy E.  The DSFG "contract"  would
still place the programs in contrib.

Ironically, strategy E is the safest legally, since the Unicode license 
specifically grants the authority to distribute the unchanged 
unicodeData.txt file, and specifically grants authority to extract data
from the file, the programmer is clearly within the terms of the
license.  The unicode license DOES NOT explicitly grant the authority to
distribute extracted data.  I suspect that this is an accidental
oversight on the part of the consortium.   But...

Observations:  if unicodeData.txt is non-free, the non-free has ongoing
relevance to the Debian project.  Also, if we honor the DFSG "contract",
then contrib is going to get a LOT larger.

Jim Penny

Reply to: