[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [aspell-devel] Problems with aspell-en license



On Mon, Oct 21, 2002 at 02:18:15PM -0500, John Goerzen wrote:
> I'm not so sure.  What exactly are the preexisting materials that are used
> to make a wordlist?  It would be possible to make a compilation wordlist,
> but I disagree that a wordlist is inherently a compilation work.
> 
> For example: a dictionary could be a non-compilation work if it's prepared
> from scratch (ie, Webster's 1913 would probably fit here) or it could be a
> compilation work if it includes definitions from several other sources
> (Wordnet, gcide).

You're evading the very point I've been harping on:

Title 17, Section 101:

"A ''compilation'' is a work...arranged in such a way that the resulting
work as a whole constitutes an original work of authorship."

Is "original" completely meaningless?

What is "original" about piping the contents of 5 different dictionaries
throough the moral equivalent of "awk {print $1}"?

Does that give five different "original" word lists in your view, or
just one?

Also, under Section 101, wouldn't each of these be "derivative works" of
the source dictionaries?

"A work consisting of editorial revisions, annotations, elaborations, or
other modifications which, as a whole, represent an original work of
authorship, is a ''derivative work''."

If so, how can we tell Mr. Smith's "original" derivative work created by
piping web1913 through awk from Mr. Jones's "original" derivative work
created by piping web1913 through awk?  If one cannot tell the
difference, what does it mean to be "original"?

This is why it is patent nonsense to assert copyright in word lists.

> > Now, let's check Section 102:
> > 
> >       (b) In no case does copyright protection for an original work of
> >     authorship extend to any idea, procedure, process, system, method
> >     of operation, concept, principle, or discovery, regardless of the
> >     form in which it is described, explained, illustrated, or embodied
> >     in such work.
> 
> This is basically excluding patents from copyright law.  I don't see its
> relevance here.

No, it does far more than that.  The copyrightability of a work lies
*strictly* in its expression, not in the underlying ideas, procedures,
processes, systems, methods of operation, etc.  Where your possibilties
for expression are limited, originality may be impossible.

I assert that when I type "awk {print $1}", it's no more or less
original than when anyone else does it.  That is to say, it's not
original at all.  It does not even come close to rising to any
reasonable standard of originality.

> > So, even if a person tries to assert copyright in a word list, one can
> > create an alternative word list of equal or greater utility by simply
> > extracting terms from any of several publicly-licensed dictionaries,
> 
> Well, the "equal or greater utility" has yet to be seen :-)

cat /usr/share/dictd/*.index | awk '{print $1}' | grep -i '^[a-z]*$' | sort -u > wordlist

How original is it?

That took about 1 minute to come up with, including the time I spent
poking into the files in /usr/share/dictd, which I'd never looked at
before.

> > The copyright in those dictionaries -- themselves copyrighted works at
> > some point -- resides in the definitions, not the defined words
> > themselves, at least insofar as anything calling itself a "dictionary"
> > purports to define words that are *actually already in use*.  This
> 
> Most works are probably made up of words that are actually already in use. 
> It seems far-fetched to claim that there is a special exclusion in copyright
> law pertaining to words in dictionaries.

There doesn't need to be one, because the law already requires
"originality", a term which some people appear to be taking to mean "any
degree of effort".

> > By this argument, an arrangement of unoriginal words cannot be
> > copyrightable when it purports to be ordered according to a certain
> > idea, procedure, process, system, method of operation, concept,
> > principle, or discovery.
> 
> I believe that the section in question states that the idea, procedure, etc.
> itself is not copyrightable, but the specific text used to describe it is.

Not necessarily.  See above.

> > If there is any process by which we can create a superset of the
> > supposedly-copyrighted word list and subsequently prune it down to a
> > duplicate, then that word list is provably non-original.
> 
> This too seems weak to me -- could you not create a process to form the text
> of a novel by using the original to pick out ordered words from a
> comprehensive master word list?

For any work of a substantial length -- say, more than a dozen words or
so -- the only way to do so that wouldn't be computationally intractable
would be to generate a simple encoding mechanism for the words used in
the work.  In other words, we wouldn't be generating a sequence and then
pruning out the undesirable elements, but building up the sequence from
scratch using the work in question as a reference.

For example, if I assign a unique number to each word in my "wordlist"
file above, I can create the following mapping:

% egrep -n '^(the|quick|brown|fox|jump|over|the|lazy|dog)$' wordlist
145903:brown
156328:dog
162102:fox
170933:jump
172270:lazy
180960:over
187443:quick
199269:the

("jumps" is not in the wordlist, so 'scuse my grammar in the following
example.)

I thus encode:

the quick brown fox jump over the lazy dog

as:

199269 187443 145903 162102 170933 180960 199269 156328

These expressions are equivalent; one can be easily transformed to
another.  I cannot escape the arm of copyright law by encrypting my
W4R3Z before distributing them to my fellow D00DZ, and nothing in my
argument would imply that I could.

Feel free to use an iterative process to generate all possible sequences
of nine words from the 207,451 in my wordlist, and keep in mind that
actual English contains many more words than this.

Anyway, the very large vocabulary of the English language conceiably
makes it easier to pack more originality into a smaller space.  English
has few fewer opportunities for "collision" in creative expression than
a language with a vocabulary of only a few hundred words.

Coincidence, automated generation, and independent innovation are all
evidence that a given expression is unoriginal.  This is a point that
seems to be completely lost on most pundits today.  "First past the
post" applies to patent law, not copyright law.  If I independently come
up with the same original expression as you, then, in theory, copyright
law does not allow us to sue each other for copyright infringement.

From a public policy perspective, then, it makes sense to not set the
bar for originality too low.  To do so will stife free expression, clog
the courts, and effect redistribution of wealth to tort lawyers.

Asserting copyright in alphabetically-sorted lists of common English
words -- or common words in any language -- is patently absurd, and
anyone who does so is either a scoundrel or a fool.

> I'd like to say that I hope that it is the case that these are
> un-copyrightable, but as of yet your arguments based on law don't seem
> convincing.

I don't think you are reading my argument very carefully.

-- 
G. Branden Robinson                |     There's nothing an agnostic can't
Debian GNU/Linux                   |     do if he doesn't know whether he
branden@debian.org                 |     believes in it or not.
http://people.debian.org/~branden/ |     -- Graham Chapman

Attachment: pgpGkKYGoCAgu.pgp
Description: PGP signature


Reply to: