[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#931827: lighttpd: server returnd 400, if %C0 is included in the URL



On Fri, Jul 12, 2019 at 02:13:00PM +0200, Helmut Grohne wrote:
> Hi,
> 
> On Thu, Jul 11, 2019 at 02:38:19AM +0200, OHNO Tetsuji wrote:
> > lighttpd server is returnd ???400 Bad Request", if %C0 (or any other
> > char.) is included in the URL.
> > 
> > for example,
> > http://localhost/index.lighttpd.html : return OK (display index page)
> > http://localhost/index.lighttpd.html?%C0 : 400 Bad Request
> > http://localhost/index.lighttpd.html?%C1 : 400 Bad Request
> > http://localhost/index.lighttpd.html?%C2 : OK
> > 
> > I can't understand this behavior.
> 
> Thank you for the detailed report. I don't fully understand this either
> and am thus Ccing Glenn Strauss (upstream).

https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings

"
   The standard specifies that the correct encoding of a code point use
   only the minimum number of bytes required to hold the significant bits
   of the code point. Longer encodings are called overlong and are not
   valid UTF-8 representations of the code point. This rule maintains a
   one-to-one correspondence between code points and their valid encodings,
   so that there is a unique valid encoding for each code point. This
   ensures that string comparisons and searches are well-defined.
"

https://tools.ietf.org/html/rfc3986#section-2.5

"
   When a new URI scheme defines a component that represents textual
   data consisting of characters from the Universal Character Set [UCS],
   the data should first be encoded as octets according to the UTF-8
   character encoding [STD63]; then only those octets that do not
   correspond to characters in the unreserved set should be percent-
   encoded.
"

tl;dr: URIs must contain valid UTF-8, including percent-encoded bytes
       of UTF-8 chars, as required above.

C0 might be part of the byte sequence C0 80, which is an overlong
UTF-8 encoding of the NUL character.  In the wrong contexts, this
might be abused in a truncation attack if C0 80 in the middle of a
string were interpreted as '\0'.

Both C0 and C1 bytes are part of overlong UTF-8 encodings, and are
not part of any UTF-8 encodings using the minimum number of bytes,
as required by the standard.  Therefore, lighttpd rejects those
percent-encoded bytes when looking for potentially malicious bytes
in URLs.

If you are storing binary data in a URL and naively percent-encode
the bytes, doing so is not guaranteed to produce valid UTF-8.
Please consider a different encoding for your binary data, such as
base64 modified to use URL-safe chars.

Cheers, Glenn


Reply to: