Re: location of UnicodeData.txt
On Sun, Dec 01, 2002 at 11:10:09AM +0100, Bernhard R. Link wrote:
> * Jim Penny <firstname.lastname@example.org> [021130 18:43]:
> > Huh? If I change the text of the standard, I have changed the standard!
> > For example, if I have :
> > 0332;COMBINING LOW LINE;Mn;220;NSM;;;;;N;NON-SPACING UNDERSCORE;;;;
> > and change this to
> > 0332;NON-COMBINING LOW LINE;Mn;220;NSM;;;;;N;SPACING UNDERSCORE;;;;
> > Then the standard has been changed!
> > That is, this file is line after line of character number assignment,
> > followed by character name, (and other information). There is no
> > possible change that does not change the standard!
> > Hint: (from standard writer's viewpoint) - A standard that can be
> > changed by anyone, at anytime, without notice and consultation is not
> > a standard, especially if it is a contentious standard that has some
> > people seriously upset (i.e, Russian and XJK users).
> You seem to understand less and less. If the text is changed, it is no
> longer the standard. (A standard can not be changed changing the text,
> as the standard is not a local file, but the unmodified text).
So, can a standard be DSFG free?
> What the licence of a standard file may resonable demand is that no
> changed text pretends to be the unmodified standard.
They can demand more than that, a lot more. All of copyright law comes
to bear (if the standard is deemed copyrightable and has been
copyrighted.) In particular, the owner of a copyright has, unless
waived, control over the right to distribute "derivative works".
> > The text of every standard that I know of is modifiable. However, it
> > normally takes the consent of the standards body and is issued under
> > its aegis. Again, Jim Penny's unicode standard has no value, and even
> > debian unicode has very limited appeal.
> You are again talkin of the standard. Not the text of the standard.
> A standard body can issue a new standard. And trademark laws and other
> things can force any new "XYZ standard for UVW" to be issued by some
> special entity.
Look at the file! UniCode.txt is the core of the standard, it
happens to be an ASCII text file. So what, every standard is embodied
in text at some point!
You seem to regard standards as some Platonic ideal, completely divorced
from the text which defines them. This may be a valid viewpoint in some
cases; e.g. the original algol-60 report. It is not in other cases,
e.g. the algol-68 report. UniCode.txt is a text file which has no
redundacy and no explanatory text. There is simply no portion of this
file that can be modified without making an artifact that differs from
the standard in some substantive way.
> > On the other hand, if you wish to create a competitor to the unicode
> > standard, say the debicode standard, I see no moral right that you have
> > to incorporate, without permission, the unicode standard. You should
> > expect to start from scratch!
> > Now, IANAL, but I suspect that any unicode editor that reproduced enough
> > information from the unicode standard to be useful would be considered a
> > derived work. More importantly, I think that is is arguable that this
> > table is, in the terms of the Debian Social Contract, "necessary for
> > the execution" of a full unicode editor. (The language of the debian
> > Social Contract is even more general and vague than copyright law!
> It talkes about "and to freely use the information supplied in the
> creation of products supporting the UnicodeTM Standard."
> If this does not include making modifications, then jurisdiction is
> more broken then I ever thought. (In my eyes the information should
> even not be copyrightable at all, but this point may be discussed).
The license permits "extraction" of information for "documentation or
programs". This may be completely different from "modification" or
"correction" of information.
> > In either case, the social contract would place the unicode table into
> > non-free; and any editor that depended on the table, or information
> > derived from the table (in a copyright sense) in either non-free or
> > contrib.
> The table itself may be non-free. I doubt any editor will use the file
> itself but use modification suitable for the program.
> > I have no problem with this result. But saying that the unicode
> > character table cannot be distributed by debian, in spite of specific
> > language permitting us to do so, seems a bit extreme.
> If it does not suit for main, then it can not be distributed as part of
> debian. (by definition)
But is can be distributed by debian, not as a part of debian. That is,
it may be put in non-free, and it may be distributed using the debian
mirrors. Note: I did not use the phrase "part of", that is yours.
> > And the
> > consequences of this decision will probably seem extreme to many people.
> > This example just happens to be particularly cogent; there is no doubt
> > it is non-free, there is no doubt it is copyrightable, there is little
> > doubt that it is "necessary for the execution" of a substantial corpus
> > of programs which are otherwise DFSG free. These program would
> > certainly include unicode editors, and would probably include python,
> > perl and ruby.
> These "no doubt" are all wrong in my eyes.
which no doubts are wrong:
1) it is non-free.
It fails section 3 of the DSFG. Makes is non-free.
2) it is copyrightable.
There is more than a minimal element of creativity here. The only
overlap with any previous character coding is the first 127
characters (ASCII). The creation of the classification scheme itself
is a work of creativity. Moreover, there are a huge number of
possible character tables. This is only one of many possible
arrangements. Copyright has been asserted, and no jurisdiction has
held that this assertion is invalid, so it must be persumed to be
3) It, or information derived from it, is "necessary for execution" by
a unicode editor.
Copyright grants control of derivative works. The Unicode Consortium
has partially waived this control, by permitting "extraction" of
information to be used in programs. They have said nothing about
licensing of the programs using the information. So, it is possible
to create a program with a DSFG license that either uses the file, or
information extracted from the file. This is clear.
Now, suppose you are writing an editor that is supposed to be capable
of handling any unicode character. Said editor will need to have
access to any attribute of any character. It will thus need to have
access to the UniCode.txt file, and probably more. (It will probably
need font/glyph information, as well.) This would appear to make it
"necessary for execution". Note, this is not a copyright term of art,
but is a phrase introduced by the debian social contract. The Debian
social contract says that any work which depends on something in
non-free belongs in contrib, not main.
It is not at all clear that "depending on information extracted from the
file" is different from "depending on the file", for the purpose of
determining whether a work is in main or contrib. Again, this has
nothing to do with copyright law.
The "extracted information" is certainly not DSFG free if the
information in its totality is not. In fact, the Unicode Consortium
license is silent in this regard. The right to distribute files
in the Unicode Character Database is granted. The right to extract
information from these files for documentation or programs is granted.
The right to distribute files consisting of information extracted
from the UCD is nowhere granted, and may be reserved (or, it may be
implicit in the grant of extraction). Until someone forces a court
case, this will be unknown. Consult a lawyer.
4) It is necessary for other unicode aware applications.
This is more doubtful. It would probably depend on the nature of the
application. If a system simply declared a section of data to be
UniCode data, and made no attempt to comprehend the contents, it
probably would not need to have access to the contents of Unicode.txt.
If it implements things like regular expressions over unicode, things
get much more dicey. If it permits things like searching by character
name, then unicode.txt will be needed.
> Bernhard R. Link
> <gEistiO> sagen wir mal...ich hab alle sourcen in /lost+found/waimea
> <me> gEistiO: [...] Warum lost+found?
> <gEistiO> wo haette ich es denn sonst hingeben solln?
> To UNSUBSCRIBE, email to email@example.com
> with a subject of "unsubscribe". Trouble? Contact firstname.lastname@example.org