[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#233831: www.debian.org: 404 error for Accept-Language: *



Package: www.debian.org
Version: 20040220
Severity: minor

www.debian.org appears to have a problem when an asterisk is present
in the Accept-Language header of an incoming HTTP request.

When I do a HEAD /support without an Accept-Language header all is well:

 $ sed -e 's/^  //' -e 's/$/
/' <<==HERE |
 >   HEAD /support HTTP/1.1
 >   Host: www.debian.org
 >   Connection: close
 > 
 > ==HERE
 > nc www.debian.org 80
 HTTP/1.1 200 OK
 Date: Fri, 20 Feb 2004 04:47:06 GMT
 Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2 DAV/1.0.3
 Content-Location: support.en.html
 Vary: negotiate,accept-language
 TCN: choice
 Cache-Control: max-age=86400
 Expires: Sat, 21 Feb 2004 04:47:06 GMT
 Last-Modified: Tue, 17 Feb 2004 02:33:35 GMT
 ETag: "11301e9-3b67-40317d7f;403575e1"
 Accept-Ranges: bytes
 Content-Length: 15207
 Connection: close
 Content-Type: text/html
 Content-Language: en

But if I add Accept-Language: * I get a 404:

 $ sed -e 's/^  //' -e 's/$/
/' <<==HERE |
 >   HEAD /support HTTP/1.1
 >   Host: www.debian.org
 >   Accept-Language: *
 >   Connection: close
 > 
 > ==HERE
 > nc www.debian.org 80
 HTTP/1.1 404 Not Found
 Date: Fri, 20 Feb 2004 04:47:44 GMT
 Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2 DAV/1.0.3
 Content-Location: support.nb.html
 Vary: negotiate,accept-language
 TCN: choice
 Connection: close
 Content-Type: text/html; charset=iso-8859-1

Then again if I add actual languages to the Accept-Language header all
is well again:

 $ sed -e 's/^  //' -e 's/$/
/' <<==HERE |
 >   HEAD /support HTTP/1.1
 >   Host: www.debian.org
 >   Accept-Language: sv-FI, i-navajo, en-US
 >   Connection: close
 > 
 > ==HERE
 > nc www.debian.org 80
 HTTP/1.1 200 OK
 Date: Fri, 20 Feb 2004 05:04:55 GMT
 Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2 DAV/1.0.3
 Content-Location: support.en-us.html
 Vary: negotiate,accept-language
 TCN: choice
 Cache-Control: max-age=86400
 Expires: Sat, 21 Feb 2004 05:04:55 GMT
 Last-Modified: Tue, 17 Feb 2004 02:33:35 GMT
 ETag: "11301e9-3b67-40317d7f;403575e1"
 Accept-Ranges: bytes
 Content-Length: 15207
 Connection: close
 Content-Type: text/html
 Content-Language: en-us

If at this point I add ", *" to the Accept-Language header, it will
not break things.

(However, when I mistakenly used sv_FI and en_US as language tags, the
asterisk produced an error, whereas if I sent the same request without
the asterisk, I would get a 200 with what is apparently the Mother of
All Variants of this page, presumably English:

 $ sed -e 's/^  //' -e 's/$/
/' <<==HERE |
 >   HEAD /support HTTP/1.1
 >   Host: www.debian.org
 >   Accept-Language: sv_FI, i_navajo, en_US
 >   Connection: close
 > 
 > ==HERE
 > nc www.debian.org 80  |
 > egrep '^(HTTP|Content-Location)'
 HTTP/1.1 200 OK
 Content-Location: support.html

If I put back in the * as a catch-all, there's the 404 again:

 $ sed -e 's/^  //' -e 's/$/
/' <<==HERE |
 >   HEAD /support HTTP/1.1
 >   Host: www.debian.org
 >   Accept-Language: sv_FI, i_navajo, en_US, *
 >   Connection: close
 > 
 > ==HERE
 > nc www.debian.org 80  |
 > egrep '^(HTTP|Content-Location)'
 HTTP/1.1 404 Not Found
 Content-Location: support.nb.html

You'll notice that I'm taking the liberty to grep just the interesting
parts of the response here to keep this parenthesis shorter.)

The language tag "*" is explicitly allowed in RFC2616 section 14.4 to
mean any other language not already listed. Granted, passing it in on
its own is perhaps dubious.

Real-world case: The W3C link validator reports broken links (404s)
to many of the important pages on www.debian.org. Here is a discussion:
<http://thread.gmane.org/gmane.org.w3c.validator/3520>

/* era */

-- System Information
Debian Release: 3.0
Kernel Version: Linux there.afraid.org 2.2.20 #1 SMP Thu Nov 7 16:15:53 EET 2002 i586 unknown



Reply to: