[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#471930: closed by Eckhart Wörner <ewoerner@kde.org> ()



Hi Eckhart,

Thank you for getting on the case. Apologies in advance for the long read;
I want to make sure I'm not missing anything in my thought process and
especially in my understanding of RFC3986.

After carefully reading RFC3986, my conclusion, too, is that '+' and 
'%2B' should be treated as equivalent in path components.

Section 2.2 defines '+' as being part of the sub-delims subset of reserved
characters, and goes on to state:

    If a reserved character is found in a URI component and
    no delimiting role is known for that character, then it must be
    interpreted as representing the data octet corresponding to that
    character's encoding in US-ASCII.

Section 3.2 defines sub-delims (and thus '+') as a valid part of a
path component. So, following the above quote, '%2B' in path components
must be interpreted as '+'. In other words, Konqueror's replacing of
'%2B' by '+' in path components should not change the meaning of the URI.

On the other hand, I am not as sure about whether replacing 
'%2B' by '+' in path components is actually _correct_. Citing RFC2986 again,
on the topic of when to decode percent-encoded characters, it says:

Section 2.4:

    Once produced, a URI is always in its percent-encoded form.

    When a URI is dereferenced, the components and subcomponents
    significant to the scheme-specific dereferencing process (if any)
    must be parsed and separated before the percent-encoded octets within
    those components can be safely decoded, as otherwise the data may be
    mistaken for component delimiters.  The only exception is for
    percent-encoded octets corresponding to characters in the unreserved
    set, which can be decoded at any time.

This suggests to me that _only_ characters in the unreserved set may be
decoded "at any time", and that there are restrictions on decoding
percent-encoded characters not in that set. The unreserved set is defined
in section 2.3 as:

    Characters that are allowed in a URI but do not have a reserved
    purpose are called unreserved.  These include uppercase and lowercase
    letters, decimal digits, hyphen, period, underscore, and tilde.

      unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

In particular, the unreserved set does not include '+'.

Section 6.2.2.2 specifically references section 2.3 when it specifies
percent-encoding normalization:

    These URIs should be normalized by decoding any percent-encoded 
    octet that corresponds to an unreserved character, as described in 
    Section 2.3.

I take that to mean that an implementation should (but is not required to)
normalize a percent-encoded character to its decoded form, but only if
it is part of the unreserved set, which does not include '+'.

Furthermore, section 2.2 states, about the set of reserved characters
(which includes '+'):

    These characters are called "reserved" because they may (or may not) 
    be defined as delimiters by the generic syntax, by each 
    scheme-specific syntax, or by the implementation-specific syntax of 
    a URI's dereferencing algorithm.

and, later on:

    Percent-encoding a reserved character, or decoding a percent-encoded
    octet that corresponds to a reserved character, will change how the URI 
    is interpreted by most applications.  Thus, characters in the reserved
    set are protected from normalization

It seems to me that the RFC specifically limits normalization to characters
of the unreserved set, excluding characters from the reserved set because
decoding a percent-encoded reserved character may lead to interoperability
problems of exactly the kind I have observed.

Summing up, my reading of RFC3986 is that:

 1. In path components, '%2B' must be treated as equivalent to '+'.
    Hence, Slashdot is in error, because it treats them differently.

 2. However, the RFC also disallows translating '%2B' to '+' in URIs,
    specifically to avoid interoperability problems with applications
    that treat them differently. Hence, if Konqueror performs such
    translation, it is also in error.

Leaving the RFC aside, I object to Konqueror's behavior because I find it
inconsistent. In the absense of actions taken by the server that would
obviously influence the outcomes (such as redirects), I would expect the
following to hold:

 1. The request will be made using the same URI that I typed in the address
    bar or an equivalent.

 2. The URI shown in the address bar will be the same as the one that was
    used to make the request.

 3. Re-using the URI shown in the address bar will send the same URI
    as is shown in the address bar.

 4. Re-using the URI shown in the address bar will send the same URI
    as the original request.

 5. The URI that will be shown in the history will be the same that was used
    to perform the request.

 6. The URI that will be used when re-performing a request from the history
    will be the same as the URI that was displayed for the history entry.

What I observed is that:

 1. The request will be made using the same URI that I typed in the address
    bar. (ok)

 2. The URI shown in the address bar is _different_ from the one that was
    used to make the request. (wrong)

 3. Re-using the URI shown in the address bar sends the same URI
    as is shown in the address bar. (ok)

 4. Re-using the URI shown in the address bar sends a _different_ URI
    from the original request. (wrong)

 5. The URI that will be shown in the history is _different_ from the one
    that was used to perform the request. (wrong)

 6. The URI that will be used when re-performing a request from the history
    is the same as the URI that was displayed for the history entry. (ok)

In particular, the fact that, after performing the original request,
the address bar shows a URI that is _not_ the URI that was used to make the
request strikes me as wrong: it seems to be saying
"here is the page with URI X" while it is really the page with URI Y.

I am looking forward to your thoughts and comments on this. Thanks for your
efforts so far, and thanks also to everyone who is looking into this, and,
of course, the Konqueror team in general. Keep up the good work!

Kind regards,

Bob

On Sat, Apr 17, 2010 at 12:15:10PM +0000, Debian Bug Tracking System wrote:
> This is an automatic notification regarding your Bug report
> which was filed against the konqueror package:
> 
> #471930: konqueror: Konqueror urldecodes URLs and remembers the decoded URL
> 
> It has been closed by Eckhart Wörner <ewoerner@kde.org>.
> 
> Their explanation is attached below along with your original report.
> If this explanation is unsatisfactory and you have not received a
> better one in a separate message then please contact Eckhart Wörner <ewoerner@kde.org> by
> replying to this email.
> 
> 
> -- 
> 471930: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=471930
> Debian Bug Tracking System
> Contact owner@bugs.debian.org with problems

Received: (at 471930-done) by bugs.debian.org; 17 Apr 2010 12:14:12 +0000
X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02
	(2008-06-10) on busoni.debian.org
X-Spam-Level: 
X-Spam-Bayes: score:0.0000 Tokens: new, 18; hammy, 64; neutral, 27; spammy, 0.
	spammytokens: hammytokens:0.000-+--H*rp:D*kde.org, 0.000-+--H*F:D*kde.org,
	0.000-+--H*RU:sk:ewoerne, 0.000-+--H*r:sk:ewoerne, 0.000-+--H*F:U*ewoerner
X-Spam-Status: No, score=-1.0 required=4.0 tests=AWL,BAYES_00,MISSING_SUBJECT,
	NOSUBJECT,SPF_HELO_PASS autolearn=no version=3.2.5-bugs.debian.org_2005_01_02
Return-path: <ewoerner@kde.org>
Received: from moutng.kundenserver.de ([212.227.17.10])
	by busoni.debian.org with esmtp (Exim 4.69)
	(envelope-from <ewoerner@kde.org>)
	id 1O36ua-00007j-CO
	for 471930-done@bugs.debian.org; Sat, 17 Apr 2010 12:14:12 +0000
Received: from yoda.localnet (agsb-5d853657.pool.mediaWays.net [93.133.54.87])
	by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis)
	id 0MbJR6-1NmfzF28LU-00JEm6; Sat, 17 Apr 2010 14:14:05 +0200
From: Eckhart =?utf-8?q?W=C3=B6rner?= <ewoerner@kde.org>
To: 471930-done@bugs.debian.org
Date: Sat, 17 Apr 2010 14:14:04 +0200
User-Agent: KMail/1.13.2 (Linux/2.6.32-3-686; KDE/4.4.2; i686; ; )
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
Message-Id: <201004171414.04979.ewoerner@kde.org>
X-Provags-ID: V01U2FsdGVkX1+SVf2p0pQ2WtqAP4AQv3QNt4Rz+VnQ72CXdvH
 8a9rsBgeWA6bF/Oy9AF14L+O5qwQwBvup+BppAh569mipc/qFl
 55xqDLLRyp20HrR9tS4WZVDSH5eEEk5
> 
> As already pointed out, this is a problem with Slashdot.
> 

Received: (at submit) by bugs.debian.org; 21 Mar 2008 06:59:18 +0000
X-Spam-Checker-Version: SpamAssassin 3.1.4-bugs.debian.org_2005_01_02 
	(2006-07-26) on rietz.debian.org
X-Spam-Level: 
X-Spam-Status: No, score=-8.8 required=4.0 tests=BAYES_00,FORGED_RCVD_HELO,
	FOURLA,HAS_PACKAGE,IMPRONONCABLE_2 autolearn=no 
	version=3.1.4-bugs.debian.org_2005_01_02
Return-path: <debian-bugs@inglorion.net>
Received: from smtp1.bbeyond.nl ([82.204.126.15])
	by rietz.debian.org with esmtp (Exim 4.63)
	(envelope-from <debian-bugs@inglorion.net>)
	id 1JcbDh-0005IM-Tc
	for submit@bugs.debian.org; Fri, 21 Mar 2008 06:59:18 +0000
Received: from amavisd-new (smtp1.bbeyond.nl [82.204.126.15])
	by smtp1.bbeyond.nl (Postfix) with ESMTP id D9B39495FF;
	Fri, 21 Mar 2008 07:59:15 +0100 (CET)
X-Virus-Scanned: amavisd-new at bbeyond.nl
Received: from morgenes.inglorion.net (78-27-26-125.dsl.alice.nl [78.27.26.125])
	by smtp1.bbeyond.nl (Postfix) with ESMTP id 7B94E48891;
	Fri, 21 Mar 2008 07:59:15 +0100 (CET)
Received: from inglorion by morgenes.inglorion.net with local (Exim 4.63)
	(envelope-from <debian-bugs@inglorion.net>)
	id 1JcbDe-0000d0-TY; Fri, 21 Mar 2008 07:59:14 +0100
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Robbert Haarman <debian-bugs@inglorion.net>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: konqueror: Konqueror urldecodes URLs and remembers the decoded URL
Message-ID: <20080321065914.2112.4936.reportbug@78-27-26-125.dsl.alice.nl>
X-Mailer: reportbug 3.31
Date: Fri, 21 Mar 2008 07:59:14 +0100
Delivered-To: submit@bugs.debian.org
> 
> Package: konqueror
> Version: 4:3.5.5a.dfsg.1-6etch2
> Severity: normal
> 
> When you open a page whose URL contains characters that must be 
> urlencoded, Konqueror will let you enter the URL properly encoded (with 
> % escapes, etc.) and visit the page correctly.
> 
> However, it will decode the URL and display the decoded URL in the 
> address bar (e.g. + will be changed to space, %2B will be changed to +, 
> etc.) This often causes the "URL" in the address bar to not actually be 
> a valid URL. For example, if you select the address bar and press 
> return, you will receive an error, or at the very least not go to the 
> same page whose URL you originally entered.
> 
> The decoded URL is also saved in the history, so that you can, for 
> example, use the up and down arrow keys in the address bar to select a 
> previously visited page, and not go there, because the URL that has been 
> saved with it is not the right URL, but the urldecoded version of it.
> 
> This behavior annoys me. I can see the point of wanting to display the 
> address in decoded form for some users in some situations (e.g. when 
> using Konqueror as a file manager - which I never do, by the way). 
> However, I would, at the very least, want to be able to turn off this 
> functionality, so that the URLs I enter will not be mangled.
> 
> Steps to reproduce:
>  - Visit any website with an URL that contains characters that need 
>    escaping. For example:
>    http://slashdot.org/~RAMMS%2BEIN/
> 
>  - Konqueror will correctly open the page, but mangle the URL. E.g.
>    http://slashdot.org/~RAMMS+EIN/
> 
>  - If you try to open the same page again, e.g. by selecting the
>    address bar and pressing return, or by selecting the address bar,
>    you will not go to the same page you originally visited.
> 
>  - If you visit another page, then select the address bar and use the
>    up arrow to navigate back to the original page, then press return
>    to select it, you will get the mangled URL and you will not visit
>    the page whose URL you originally entered.
> 
> -- System Information:
> Debian Release: 4.0
>   APT prefers stable
>   APT policy: (500, 'stable')
> Architecture: i386 (i686)
> Shell:  /bin/sh linked to /bin/bash
> Kernel: Linux 2.6.18-4-686
> Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
> 
> Versions of packages konqueror depends on:
> ii  kcontrol          4:3.5.5a.dfsg.1-6etch2 control center for KDE
> ii  kdebase-kio-plugi 4:3.5.5a.dfsg.1-6etch2 core I/O slaves for KDE
> ii  kdelibs4c2a       4:3.5.5a.dfsg.1-8etch1 core libraries and binaries for al
> ii  kdesktop          4:3.5.5a.dfsg.1-6etch2 miscellaneous binaries and files f
> ii  kfind             4:3.5.5a.dfsg.1-6etch2 file-find utility for KDE
> ii  libacl1           2.2.41-1               Access control list shared library
> ii  libart-2.0-2      2.3.17-1               Library of functions for 2D graphi
> ii  libattr1          2.4.32-1               Extended attribute shared library
> ii  libaudio2         1.8-4                  The Network Audio System (NAS). (s
> ii  libc6             2.3.6.ds1-13etch5      GNU C Library: Shared libraries
> ii  libfam0           2.7.0-12               Client library to control the FAM 
> ii  libfontconfig1    2.4.2-1.2              generic font configuration library
> ii  libfreetype6      2.2.1-5+etch2          FreeType 2 font engine, shared lib
> ii  libgcc1           1:4.1.1-21             GCC support library
> ii  libice6           1:1.0.1-2              X11 Inter-Client Exchange library
> ii  libidn11          0.6.5-1                GNU libidn library, implementation
> ii  libjpeg62         6b-13                  The Independent JPEG Group's JPEG 
> ii  libkonq4          4:3.5.5a.dfsg.1-6etch2 core libraries for Konqueror
> ii  libpng12-0        1.2.15~beta5-1         PNG library - runtime
> ii  libqt3-mt         3:3.3.7-4etch1         Qt GUI Library (Threaded runtime v
> ii  libsm6            1:1.0.1-3              X11 Session Management library
> ii  libstdc++6        4.1.1-21               The GNU Standard C++ Library v3
> ii  libx11-6          2:1.0.3-7              X11 client-side library
> ii  libxcursor1       1.1.7-4                X cursor management library
> ii  libxext6          1:1.0.1-2              X11 miscellaneous extension librar
> ii  libxft2           2.1.8.2-8              FreeType-based font drawing librar
> ii  libxi6            1:1.0.1-4              X11 Input extension library
> ii  libxinerama1      1:1.0.1-4.1            X11 Xinerama extension library
> ii  libxrandr2        2:1.1.0.2-5            X11 RandR extension library
> ii  libxrender1       1:0.9.1-3              X Rendering Extension client libra
> ii  libxt6            1:1.0.2-2              X11 toolkit intrinsics library
> ii  zlib1g            1:1.2.3-13             compression library - runtime
> 
> konqueror recommends no packages.
> 
> -- no debconf information
> 
> 





Reply to: