Bug#373218: qa.debian.org: Patch
Raphael Hertzog <hertzog@debian.org> writes:
> On Thu, 28 Jun 2007, Russ Allbery wrote:
>> Well, he extracts everything between <>, but I believe we still lose
>> if, for instance, there's a # in the e-mail address (which is an
>> entirely valid RFC 2822 character). I'm a little worried about +,
>> which is a very common character and sometimes has special
>> interpretations in URLs.
> So fix the # case (I can do a fixed list of character translation).
> Email address can contain almost anything but in practice they don't
> contain much fancy stuff compared to real names. The "+" has a special
> meaning only in CGI (GET) parameters AFAIK.
According to RFC 2396, the list of characters reserved, banned, or
disrecommended for URIs are:
; / ? : @ & = + $ , < > # % " { } | \ ^ [ ] `
and space. The safest thing to do would be to map all of those characters
to _. (Some of them we could get away with not mapping, but I prefer to
appeal to a clear authority for things like this rather than generating a
custom list.)
We still lose if someone has a non-ASCII or control character in their
e-mail address, but that's probably not a likely problem given that RFC
2822 doesn't permit that either.
--
Russ Allbery (rra@debian.org) <http://www.eyrie.org/~eagle/>
Reply to: