[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#373218: qa.debian.org: Patch



Raphael Hertzog <hertzog@debian.org> writes:
> On Thu, 28 Jun 2007, Russ Allbery wrote:

>> Well, he extracts everything between <>, but I believe we still lose
>> if, for instance, there's a # in the e-mail address (which is an
>> entirely valid RFC 2822 character).  I'm a little worried about +,
>> which is a very common character and sometimes has special
>> interpretations in URLs.

> So fix the # case (I can do a fixed list of character translation).
> Email address can contain almost anything but in practice they don't
> contain much fancy stuff compared to real names.  The "+" has a special
> meaning only in CGI (GET) parameters AFAIK.

According to RFC 2396, the list of characters reserved, banned, or
disrecommended for URIs are:

    ; / ? : @ & = + $ , < > # % " { } | \ ^ [ ] `

and space.  The safest thing to do would be to map all of those characters
to _.  (Some of them we could get away with not mapping, but I prefer to
appeal to a clear authority for things like this rather than generating a
custom list.)

We still lose if someone has a non-ASCII or control character in their
e-mail address, but that's probably not a likely problem given that RFC
2822 doesn't permit that either.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>



Reply to: