Bug#421820: ap_escape_uri() doesn't escape &-sign
On 5/9/07, Stefan Fritsch <firstname.lastname@example.org> wrote:
Apache behaves correctly (in principle). From RFC 2396 section 3.3:
'The path may consist of a sequence of path segments separated by a
single slash "/" character. Within a path segment, the characters
"/", ";", "=", and "?" are reserved.' 
This means '&' is a reserved character only in the query part after
the '?', but not before the '?' in the path part of the URL.
I am not sure how this helps you, though ;-). But I guess if you take
something from the path part and put it into the query part, you have
to escape everything that is reserved in the query part but not in
the path part (i.e. ":", "@", "&", "+", ",", "=", and "$").
I'm looking at RFC1738, which is referred by RFC1808, the latter one
being quoted in apache source.
Specifically, it says in section 2.2 "Reserved":
Many URL schemes reserve certain characters for a special meaning:
their appearance in the scheme-specific part of the URL has a
designated semantics. If the character corresponding to an octet is
reserved in a scheme, the octet must be encoded. The characters ";",
"/", "?", ":", "@", "=" and "&" are the characters which may be
reserved for special meaning within a scheme. No other characters may
be reserved within a scheme.
Note the "may be reserved for /special meaning/ within a scheme".
Usually a URL has the same interpretation when an octet is
represented by a character and when it encoded. However, this is not
true for reserved characters: encoding a character reserved for a
particular scheme may change the semantics of a URL.
Again "encoding a character reserved for a particular scheme may
change the semantic of a URL". That's exactly the point. Unencoded,
'&' is an argument separator. Encoded it's just another regular string
in the URL.
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.
This final sentence is very important: "reserved characters used for
their /reserved/ purposes may be used unencoded within a URL". As
such, '/' is fine unencoded since it's used for its /reserved/
purpose: a path delimiter. On the contrary, '&' within the path is not
used for its /reserved/ purpose.
Put another way, the first paragraph clearly defines '&' as being a
reserved characters. Used within a /path/ (as it is the case with
ap_escape_uri()), it's a reserved character used /outside/ of its
reserved purpose, and should thus be encoded.
I believe that this RFC is extremely clear and consistent, and it's
referred to by the RFC1808 which apache source seems to be following.
This is exactly why I think I have some ground when I say it's an apache bug 8-)
Either ap_escape_uri() (which is again ap_os_escape_path()) can be
used on something else than just path, and then it shouldn't even
encode eg '?' (which it currently does), or it's well meant to only be
used on path and then it ought to encode '&' as well in order to be