[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 in copyright files?

On Tue, Dec 16, 2003 at 12:58:44AM +0100, Jeroen van Wolffelaar wrote:
> (I'm not a Debian developer)
> On Mon, Dec 15, 2003 at 07:29:59PM +0000, Scott James Remnant wrote:
> > It's probably a good additional point that most of the standard
> > character sets have no ?? symbol and that there is no legal basis for
> > "(C)" being a valid representation of it.
> IANAL (but recently passed succesfully a computer science "Law and
> Computer Science" course), but 'legal basis'? I doubt that is needed
> for such a thing as the copyright sign, since under most countries'
> (including EU and US) copyright law, a work you authored is
> automatically 'copyright you'.  Wether you simply write: "I wrote this",
> put the real or the `fake' (c) symbol in the file with your name, or
> even don't mention at all that the work is copyrighted by you (but for
> practical reasons, mentioning your name is useful).

[ snip perfectly good Google reference ]

As best anyone on debian-legal has said in various debates over it, the
following forms of copyright notification are acceptable:

Copyright <year> <author>
<*true* copyright symbol> <year> <author>

There might be some argument that an HTML document could be flagged as
such by the &copy; construct, since that is supposed to represent a true
copyright symbol, but the (C) and (c) constructs are frequently argued
to provide no extra benefit (though it's usually granted that they don't
appear to hurt, either).

The UTF-8 coypright symbol bit sequence, however, is as much a proper
copyright symbol as, for example, the sequence 0x41 represents the letter
'A' in ASCII, ISO-8859-1, and UTF-8. To wit, it is a standard bit sequence
for representing a printable character, and all arguments I've seen so far
seem to assert that the use of it is an entirely correct way to represent a
true copyright symbol in a data file.

You are correct in that the US, and all of the EU that I know of, grant
implicit copyright; the question, however, was about explicit copyright
(among many other things that UTF-8 might be nice for, such as properly
representing the copyright holder's name).

> > On the other hand, policy states UTF-8 for Changelog which breaks katie
> > if your name contains accented characters because you're still forbidden
> > by policy to use UTF-8 in debian/control.
> Since UTF-8 isn't yet by default supported on all systems (at least not
> on my sarge system), I would rather choose for going on the safe side,
> using only 7-bit latin1.  Because of [1], you should write 'Copyright'
> in full rather than (C) to be on the safe side in a legal sense.

You're already out of luck, since I regularly use UTF-8 in the
debian/changelog file, and most of my packages that store text data in a
potentially internationalized format do it with UTF-8. Particularly since
there are people who I work with whose proper copyright notices cannot even
remotely be expressed in ISO-8859-1. If I'm going to go past US/ASCII, I'm
going all the way to UTF-8; the question was whether the copyright file is
allowed to have characters other than US/ASCII.
Joel Baker <fenton@debian.org>                                        ,''`.
Debian GNU/NetBSD(i386) porter                                       : :' :
                                                                     `. `'

Attachment: pgp_qST1cPfoC.pgp
Description: PGP signature

Reply to: