[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#421820: ap_escape_uri() doesn't escape &-sign

Hi Thibaut,

On Mittwoch, 9. Mai 2007, Thibaut VARENE wrote:
> On 5/9/07, Stefan Fritsch <sf@sfritsch.de> wrote:
> > Apache behaves correctly (in principle). From RFC 2396 section
> > 3.3:

> > This means '&' is a reserved character only in the query part
> > after the '?', but not before the '?' in the path part of the
> > URL.

> I'm looking at RFC1738, which is referred by RFC1808, the latter
> one being quoted in apache source.

I agree that apache does not follow RFC1738, but RFC 2396 "revises and 
replaces the generic definitions in RFC 1738 and RFC 1808". And I 
just noticed that RFC 3986 obsoletes 2396 and 1808, and updates 1738.

> Specifically, it says in section 2.2 "Reserved":
>    Many URL schemes reserve certain characters for a special
> meaning: their appearance in the scheme-specific part of the URL
> has a designated semantics. If the character corresponding to an
> octet is reserved in a scheme, the octet must be encoded.  The
> characters ";", "/", "?", ":", "@", "=" and "&" are the characters
> which may be reserved for special meaning within a scheme. No other
> characters may be reserved within a scheme.

RFC 2396's 2.2 is rather similar but talks about URI components, not 
whole URLs:

   Many URI include components consisting of or delimited by, certain
   special characters.  These characters are called "reserved", since
   their usage within the URI component is limited to their reserved


RFC 3986 allows the path to contain !$&'()*+,;=@ unescaped. On the 
other hand, the query part may contain ?/ unescaped.

> This is exactly why I think I have some ground when I say it's an
> apache bug 8-)

I think apache follows the latest RFC and behaves correctly. Maybe the 
comments in the source should be updated ;-)

> Either ap_escape_uri() (which is again ap_os_escape_path()) can be
> used on something else than just path, and then it shouldn't even
> encode eg '?' (which it currently does), or it's well meant to only
> be used on path

I think the latter is true (but I haven't actually looked into the 
source or API documentation),

> and then it ought to encode '&' as well in order to 
> be RFC compliant.

but this is not. Also, RFC1738 talks about UR*L*s, the later RFCs talk 
about UR*I*s, and the function is called ap_escape_ur*i*().


Reply to: