Bug#931827: lighttpd: server returnd 400, if %C0 is included in the URL

To: Helmut Grohne <helmut.grohne@intenta.de>, 931827@bugs.debian.org
Subject: Bug#931827: lighttpd: server returnd 400, if %C0 is included in the URL
From: Glenn Strauss <gstrauss@gluelogic.com>
Date: Sat, 13 Jul 2019 06:58:16 +0000
Message-id: <[🔎] 20190713065816.GA6310@shell.atof.net>
Reply-to: Glenn Strauss <gstrauss@gluelogic.com>, 931827@bugs.debian.org
In-reply-to: <[🔎] 20190712121258.knpx4qtwnyposyas@laureti-dev>
References: <[🔎] 156280549917.7047.13161906180160942612.reportbug@iris.a07.aist.go.jp> <[🔎] 156280549917.7047.13161906180160942612.reportbug@iris.a07.aist.go.jp> <[🔎] 20190712121258.knpx4qtwnyposyas@laureti-dev> <[🔎] 156280549917.7047.13161906180160942612.reportbug@iris.a07.aist.go.jp>

On Fri, Jul 12, 2019 at 02:13:00PM +0200, Helmut Grohne wrote:
> Hi,
> 
> On Thu, Jul 11, 2019 at 02:38:19AM +0200, OHNO Tetsuji wrote:
> > lighttpd server is returnd ???400 Bad Request", if %C0 (or any other
> > char.) is included in the URL.
> > 
> > for example,
> > http://localhost/index.lighttpd.html : return OK (display index page)
> > http://localhost/index.lighttpd.html?%C0 : 400 Bad Request
> > http://localhost/index.lighttpd.html?%C1 : 400 Bad Request
> > http://localhost/index.lighttpd.html?%C2 : OK
> > 
> > I can't understand this behavior.
> 
> Thank you for the detailed report. I don't fully understand this either
> and am thus Ccing Glenn Strauss (upstream).

https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings

"
   The standard specifies that the correct encoding of a code point use
   only the minimum number of bytes required to hold the significant bits
   of the code point. Longer encodings are called overlong and are not
   valid UTF-8 representations of the code point. This rule maintains a
   one-to-one correspondence between code points and their valid encodings,
   so that there is a unique valid encoding for each code point. This
   ensures that string comparisons and searches are well-defined.
"

https://tools.ietf.org/html/rfc3986#section-2.5

"
   When a new URI scheme defines a component that represents textual
   data consisting of characters from the Universal Character Set [UCS],
   the data should first be encoded as octets according to the UTF-8
   character encoding [STD63]; then only those octets that do not
   correspond to characters in the unreserved set should be percent-
   encoded.
"

tl;dr: URIs must contain valid UTF-8, including percent-encoded bytes
       of UTF-8 chars, as required above.

C0 might be part of the byte sequence C0 80, which is an overlong
UTF-8 encoding of the NUL character.  In the wrong contexts, this
might be abused in a truncation attack if C0 80 in the middle of a
string were interpreted as '\0'.

Both C0 and C1 bytes are part of overlong UTF-8 encodings, and are
not part of any UTF-8 encodings using the minimum number of bytes,
as required by the standard.  Therefore, lighttpd rejects those
percent-encoded bytes when looking for potentially malicious bytes
in URLs.

If you are storing binary data in a URL and naively percent-encode
the bytes, doing so is not guaranteed to produce valid UTF-8.
Please consider a different encoding for your binary data, such as
base64 modified to use URL-safe chars.

Cheers, Glenn

Reply to:

References:
- Bug#931827: lighttpd: server returnd 400, if %C0 is included in the URL
  - From: OHNO Tetsuji <t2ohno@gmail.com>
- Bug#931827: lighttpd: server returnd 400, if %C0 is included in the URL
  - From: Helmut Grohne <helmut.grohne@intenta.de>

Prev by Date: acpitool_0.5.1-5_amd64.changes ACCEPTED into unstable
Next by Date: Bug#928060: Log
Previous by thread: Bug#931827: lighttpd: server returnd 400, if %C0 is included in the URL
Next by thread: Processed: raising severity of GCC 9 ftbfs issues (will be raised further in July/August)
Index(es):
- Date
- Thread