> Henning Makholm <henning@makholm.net> writes:

> > Several other people have tried to argue that a word list is not
> > necessarily unoriginal. There are thousands to words that have to be
> > judged either on the list or off the list; these thousands of
> > choices are fully as much an intellectual expressive choice as the
> > choice of which words to put in which order to form a novel.

> Alright, then consider this.  Since a word list in a dictionary has a
> questionable copyright, it must be removed from a dictionary.  Then,
> people notice some common words no longer exist in the dictionary, so
> they add them.  Eventually, every missing word will be added back to the
> dictionary, so that the end result is identical to the original.

The fallacy of this argument is the assumption that the result will be
identical to the original. It assumes that the decision whether or not
a word is "common" is objective and can be reproduced with accuracy
thousands of times. I hold that this assumption is obviously absurd,
at least as long as we're talking about a natural language.

In reality, long before (if ever) the last little arcane fringe word
from the original list was added to the free dictionary, people would
have started adding words that was *rejected* (or not thought of) for
the list that was removed in the first place. Therefore, the identity
you speak of is not going to arise in practise. The new list will be
result of an independent *creative* choice of which words it should
contain. And that is precisely the common cause of its non-identity to
the original and my insistense that it can enjoy copyright protection.

> Have you ever heard of two original novels independently written that
> were identical to each other?  No, that's inconceivable.

Yes - just as inconceivable as two independently compiled word lists
to be identical to each other.

Note that there is no protection on the mere idea of "the N most
common words in English", but that description leaves an awful amount
up to choice - each time it is implemented the result is going to be
different with a very high probablity. When do you count something as
a "word"? What do you recognize as legitimate alternative spellings
and what will you reject as spelling errors. When is a word more
common than another?  When it occurs more frequently in some text
corpus? How do you select a corpus to begin with (*lots* of room for
choice here). Do you recognize personal names - sometimes they would
be likely to be spelling errors when they don't refer to the
appropriate person?  Etc. etc. etc. For many of these problems it is
impossible to set down fixed rules in advance - you'll have to do a
judgement call in *many* concrete circumstances. I hold that the
exercise of such judgement calls is a sufficiently *creative* activity
that its result ought to receive copyright protection.

To go with the novel analogy (I'd though of formally institute a Silly
Literature Analogies subthead, but you beat me to it), there can be no
copyright on the abstract plot scheme

  A loves J and vice versa, but both are emotionally incapable of
  showing it properly. A almost marries R before everything is
  finally resolved. Meanwhile, A domesticates an orphaned puppy.

However, the way that the author fleshes out this plot certainly *is*
a creative process that entails a copyright.

Your claim that "a list of common words in X-language" will always
come out the same seems not much more likely that a claim that the
above plot summary will always turn out to be _The Mammoth Hunters_
after passing through the hands of an author.

> But it's completely conceivable for two independently compiled
> dictionaries to be identical, or very nearly so.

No, I think that's completely inconceivable, giving reasonable
assumptions of the age of the universe and the number of dictionaries
the average human produces.

In cases that it was true (say, "a list of words that appear in the
same spelling more than twice in the collected works of Shakespeare"),
I would be willing to consider the possibility that independently
compiled list could be identical. And in *that* case there would be no
copyright such a list.

Henning Makholm                               "Hi! I'm an Ellen Jamesian. Do
                                        you know what an Ellen Jamesian is?"

