[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: free licensing of TEI Guidelines



Wow. Lots of issues raised. Impressive. Thanks.

op> The goal here is not to prevent modification of the guidelines,
op> nor to prevent the creation of non-TEI derivatives, but rather to
op> prevent confusion between the two.

MJR> Could you achieve this goal by endorsing official versions with
MJR> digital signatures, created with the GnuPG keys of the TEI-C
MJR> members?

Yes and no. Yes, I think OpenPGP signing of official versions could
probably achieve the goal quoted above. Sadly, I don't think I
expressed that goal well enough. My humble apologies. Here's another
crack at it.

[I do not (yet) know the Debian package system well enough to draw
analogies to it. Thus I will draw analogies to the fictitious Bednai
system instead.]

The TEI's Guidelines for Text Encoding instruct an encoder how to
construct a proper TEI text, just as the Bednai Guidelines for
Package Creation instruct a maintainer how to construct a proper
package. Many of the rules for proper text creation spelled out by
the TEI Guidelines for Text Encoding are enforced by the "software"
provided by the TEI for the purpose[1]; e.g. that the root element be
called <TEI.2> and that its first child be a <teiHeader>, and that a
<title> element is required. Similarly many of the rules for proper
package creation spelled out by the Bednai Guidelines for Package
Creation are enforced by the actual software Bednai provides its
maintainers for the purpose; e.g., if a maintainer fails to use the
--absolute-paths switch to the tar command when creating a package,
the verification script will flag an error, and (if the error is
ignored) the extraction script will fail to put the files in the
correct directories when operating on said package.

However, many of the rules for proper text creation spelled out by the
TEI Guidelines for Text Encoding are *not* enforced by provided
software, but rather are just rules people are supposed to follow;
e.g. that the hand= attribute of <add> is supposed to point (via the
ID/IDREF mechanism) to a <hand>, not to an <emph>. Similarly there are
rules for proper package creation spelled out by the Bednai Guidelines
for Package Creation that are not enforced by provided software, but
rather are just rules people are supposed to follow; e.g. that each
entry in the bednai/changelog file should list only changes that have
been made to the corresponding revision, not a "to do" list for the
next revision.

In the case of TEI, the schemas provided and the Guidelines themselves
are inextricably linked together. The schemas are extracted from the
XML source of the Guidelines by a script.

In the TEI world, the consequences for not following these not-
machine-enforced rules vary, but usually boil down to little more than
other folks pointing out that you haven't followed the rules properly.

So, the long-winded explication of how the system works is setup for
me to express the goal quoted above more clearly. It's not that we
want to avoid confusion between the TEI Guidelines and modified
derivative versions -- of course we do want to avoid that confusion,
but that's pretty easy (I think -- tell me if I'm wrong). Many
copyleft licenses insist that modified derivatives have a different
title or program name or some such. What we would like is to be able
to ensure that the schemas derived from any modified Guidelines are
not confused with schemas derived from the original.[2] (Some would
say that what we actually want is even one step of indirection
further, that the document instances permitted by the schemas
extracted from the modified Guidelines not be confused with document
instances permitted by the schemas extracted from the original
Guidelines.[3])


op> a) a copyleft notice that requires those who modify the Guidelines
op>    to retain unmodified or delete in its entirety the section that
op>    defines TEI conformance, or

MJR> Is required deletion significantly better than required invariant
MJR> retention? I'm not sure.

Better for what? Protecting the copyright holders concerns or meeting
a particular set of guidelines for freedom to modify & distribute?
Neither required deletion nor required invariant retention seems
definitionally problematic to me, although it is easy to come up with
particular cases that obviously fail freedom tests.

HM> If the deletion is required, it must be possible - then Debian
HM> could at least distribute copies with the section removed. (And
HM> then ship a cleanroom rephrasing of the relevant information in
HM> README.Debian).

I'm sorry, I don't understand this at all. Why would Debian be
interested in distributing Guidelines with the conformance section
removed? 

MJR> I wonder how to handle the case where the author of a modified
MJR> version wishes to comment on that part, also including it.

If an author wishes to *comment on* the TEI Guidelines, he or she does
not need our permission to copy that portion he or she wishes to
discuss. That's fair use. (And, even if your local publisher would
normally be too afraid to publish such a commentary for fear of suit,
i.e. they're afraid to defend fair use, the TEI-C has explicitly
stated anyone can copy up to an entire chapter without even asking.)

Our discussion here is about people who are creating text encoding
guidelines of their own using the Guidelines as their starting point.

MJR> Having lumps of a work that cannot be reused doesn't feel
MJR> DFSG-free to me, but I might not be thinking straight just now.

This doesn't make much sense to me, but may represent my misunder-
standing of DFSG and why y'all don't like GFDL. Imagine that I write a
book about prehospital cardiac dysrhythmia interpretation using 3-lead
ECG monitors[4], to be published by the (fictitious) Paramedic Free
Press Association. Per their guidelines, the book starts with an
"About the Author" section and ends with an extensive colophon.

It makes sense that I, wanting both to improve prehospital care world-
wide, and seeking as much fame, if not fortune, I can get for my
effort, would want to freely distribute my book. Furthermore, because
I am smart enough to realize that I don't have all the answers and
that the field of electrocardiography can change, it makes sense that
I would want to permit others to make changes to the content. It also
makes sense that I would *not* like someone else to be restricting
other people from using my book, even if he or she had modified it. So
overall, a copyleft license looks good.

However, there seem to me to be a few obvious caveats:
* I would not want to be blamed for bad advice someone else added to
  mine; thus I would want modifications to be indicated;
* I would not want someone else to change my biography provided in the
  "About the Author" section.
* The Paramedic Free Press Association would not want the colophon
  information that is no longer true to be retained in a modified
  version. (E.g., if part of the colophon says "This book was written
  in its entirety using Emacs/psgml on a PowerMac 7100 running Debian
  GNU/Linux 3.0 (Woody), and is valid against the DocBook XML 3.2
  DTD.", but the modifications were not written on that system nor is
  the result valid against DocBook 3.2.)

Sure, if someone changes my bio in a sufficiently nasty (and untrue)
way, I can sue for libel. But I'd really like to be able to stop 'em
before it gets to that point. Particularly when published in a book
likely to be read by my peers, and when the information could be
misunderstood as having been written by me.

These "lumps of a work" (the bio and colophon) have nothing to do with
3-lead ECG interpretation, and preventing their modification (or
insisting that they be dropped from modified versions of the book)
doesn't seem to impair your ability to use, reuse, modify, and pass on
the (maybe modified) information about 3-lead ECG interpretation.

Analogously, the conformance section of the TEI Guidelines is not
really about text encoding, but rather how you can tell whether or not
your text encoding can appropriately claim to have followed the rules.

(Note that if we were going to use the GFDL invariant approach,
instead of the "leave it alone or delete it" approach I asked about,
we would need to re-write the conformance section a bit to make this
distinction even more obvious, according to someone at FSF.)


op> b) a copyleft notice that requires that modified versions of the
op>    Guidelines describe documents with a different root element name
op>    (perhaps, similar to "<?xml", reserving any string that matches
op>    "^[Tt][Ee][Ii]") and to not use the TEI namespace.

MJR> <?xml ...?> is a processing instruction, not a root element.

Yes, I know. I only meant reserving strings in a manner similar to the
mechanism W3C uses for reserving any PI targets that match
"^[Xx][Mm][Ll]"; I did not mean to imply that a PI target and a GI
(element name) were in any way the same. In truth, though, the XML
specification reserves any names that match, not just PI targets:

   Names beginning with the string "xml", or any string which would
   match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for
   standardization in this or future versions of this specification.
		   -- XML v. 1.0 2000-10-06 section 2.3

MJR> I don't think that limiting the root element really achieves your
MJR> goal.

Why not? If you come across a document whose root element is
<Herberts-TEI>, you should probably not be too surprised if software
that claims to work on TEI documents (whose root element is <TEI.2>)
does not work on it.

Note that I am deliberately avoiding the issue that in the TEI scheme
any element can be renamed, the semantics then being stored in the
value of the TEIform= attribute. Any prose that deals with root
element renaming actually has to be worded to apply to this
attribute. But I think that difference is not relevant to our
discussion of whether it's free enough or not.


MJR> Preventing non-TEI forms using the TEI namespace seems fine to
MJR> me, but I could be wrong.

HM> I think you're wrong. One should be allowed to derive a document
HM> that described the official TEI elements as well as Microsoft's
HM> (hypothetical) namespace-[invading] extension. The license of the
HM> specification cannot stop Microsoft from implementing its own
HM> extensions; it would add insult to injury if the license helped
HM> Microsoft keep their extensions secret (by making it more
HM> difficult for white-hatted reverse engineers to reuse text from
HM> the original document when describing their findings).

I don't understand this scenario at all -- what is there to be reverse
engineered? XML is plain text. Microsoft could certainly make changes
and keep them secret or not. What we'd like to prevent is Microsoft's
claiming that documents that conform to their modified Guidelines
conform to the TEI Guidelines (if they don't). That's all. I am
wondering (aloud on this list) whether or not
a) writing into the copyleft notice that modified elements should not
   claim to be in the TEI namespace achieves this goal, and
b) whether such a restriction is DFSG-free.

Besides, I don't honestly believe any copyright, no matter how
applied, could possibly inhibit any reverse engineers from reuse of
text from the original document when describing findings (even if
copyright could indeed inhibit or prevent the reverse engineering in
the first place). That's fair use. (Yes, I realize someone may be able
to show me case law where 17 USC &sect; 107 has been misinterpreted,
and I'm not really interested in starting a flame war on fair use
instead of getting my questions answered. The point is that quoting a
copyrighted document when describing the findings of a research
endeavor like reverse engineering is definitionally not infringement,
even if the reverse engineering itself required copyright infringement
or there are other reasons that such actions are illegal; therefore
such concerns should not enter into our deliberations, IMHO.)

MJR> ... preventing TEI-incompatible and TEI-unauthorised elements
MJR> from being in the TEI namespace seems fine to me. Putting our own
MJR> forms into TEI's namespace would be similar to claiming that they
MJR> said something they did not.

HM> No it isn't - not unless you *explicitly* claim that it was TEI
HM> who said it.

How more explicit than pointing to our namespace can you get? :-)

HM> Perhaps, but it is not DFSG-free.

It sort of defeats the purpose of having a standard which says "you
can use the TEI <cb> element to assert that a column break occurred in
the source" if modified versions of the standard say "you can use the
TEI <cb> element to describe the cost-basis of a stock". All we'd like
is for the modified version to say "you can use the CPA <cb> element
to describe the cost-basis of a stock" instead. Are you suggesting
that no standards should be DFSG-free?

Here's a thought experiment. Since the Debian Policy Manual
is GPLed, I could make a modified version of said manual with a few
choice changes to the DFSG, and then create a Debian system that
adhered to *my* Debian Policy Manual that was just like Woody (and
would still be called Debian and Woody) but contained my (fictitious)
XMLspec package, too, even though the XML Specification is not free in
any Debian way. Indeed my version *of the Debian Policy Manual* would
need to say prominently that I had modified it, but not my version *of
the system*.


Well, there is a lot more for me to read, ponder, and possibly respond
to, but if I keep this up I won't get any work done today at all. Let
me thank you all again for taking the time to discuss this, and leave
you with one clarification, one explanation, one question, and one
hope:

Clarification: when discussing namespaces I never meant to imply
(although I think I did) that TEI would want to stop anyone from using
the TEI namespace for an element that had the same semantics in a
modified version as in the original Guidelines. To the contrary, the
point of such a standard is that when software comes across
<tei:title> it knows this is the title of a book, record, movie, or
other creative work, and not a person's societal title (which would be
<tei:roleName>) or the title to a deed of land (for which there is no
specific TEI element).

Explanation: Please realize, by the way, that we are only concerned
here with modified versions of the Guidelines and the document
instances they permit; we are explicitly not discussing extensions to
the Guidelines, for which there is a well documented system. So if
what you wanted was an element to indicate a title to a deed of land,
you can invent your own <realEstateTitle> element and use it perfectly
well within the TEI scheme without ever even thinking about modifying
the Guidelines themselves.

Question: what is "DSL", and is there general agreement that this is a
good way to go?

Hope: I'm still hopeful there's a method to copyleft the Guidelines in
such a way as to both address TEI-C's concerns and make it free enough
to be a Debian package (that's not in non-free :-).


Notes
-----
[1] In scare quotes because what the TEI actually provides is a schema
    (DTD, RelaxNG, W3C Schema, whatever), not truly software. But you
    get the idea.

[2] Yes, we're fully aware that no copyright notice will actually
    prevent people from copying in unapproved ways, that all it does
    is give us some leverage in getting them to stop.

[3] I believe that an epistomologist might prefer to say that a schema
    "licenses" document instances, but using that term might prove
    quite confusing in the current context, so I used "permits". One
    could easily say that, just as a regular expression defines a set
    of "sentences" in the language, a schema defines a set of
    instances. So what we're really talking about is confusion between
    the members of the sets of document instances that are valid
    against the set of schemas derived from the original vs. the
    members of the sets of document instances that are valid against
    the set of schemas derived fvrom the modified version.

[4] Something I actually might have wanted to do not so long ago. But
    nowadays prehospital 12-lead ECG machines are becoming so common,
    it doesn't seem to be worth the effort.



Reply to: