[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#703481: libapt-pkg4.12: URI parsing not to RFC



Package: libapt-pkg4.12
Version: 0.9.7.7
Severity: minor

Dear deity

While investigating the URI class in apt-pkg I have picked up on some
issues where URI parsing is not to RFC 3986.

<http://tools.ietf.org/html/rfc3986>

It seems this is mainly related to manipulation of the defined syntax to
support considerations for the cdrom scheme/protocol.  In this case, a
relative path component of "CDROM_NAME/PATH/TO/DIST" is interpretted as
Host "CDROM_HOST" and Path "/PATH/TO/DIST".  This contravenes the RFC
and is not necessary since the host in those cases actually is localhost
(or other) and may be omitted, with the path component redefined to be
"CDROM_NAME/PATH/TO/DIST".

RFC 3986 permits and encourages such protocol-specific handling of
path components.

>From uri_test.cc, with notation added to indicate components as per
RFC 3986:

       scheme        path
         _|_   _______|_________
        /   \ /                 \
 URI U("cdrom:Moo Cow Rom:/debian");
 …
 equals("Moo Cow Rom", U.Host);
 equals("/debian", U.Path);

Host is part of authority, which requires the "//" prefix [1].  Here the
first segment of the path is incorrectly parsed as the host.  Except for
the superfluous ":", there should be no issues in correctly parsing "Moo
Cow Rom" as the first segment of path and adapting apt-cdrom to
understand this.

      scheme    path
         |_   ___|___
        /  \ /       \
 URI U("gzip:./bar/cow");
 …
 equals(".", U.Host);
 equals("/bar/cow", U.Path);

Same issue.  This is again a relative path (‘path-rootless’ in the RFC)
beginning from ".".  Unlike in the previous example, the use case for
unusual semantics here needs to be investigated before making any
changes.

 URI U("ftp:ftp.fr.debian.org/debian/pool/main/x/xtel/xtel_3.2.1-15_i386.deb");
 …
 equals("ftp.fr.debian.org", U.Host);
 equals("/debian/pool/main/x/xtel/xtel_3.2.1-15_i386.deb", U.Path);

An obvious typo in the original string, should not be parsed this way.

 URI U("cdrom:[The Debian 1.2 disk, 1/2 R1:6]/debian/");
 …
 equals("The Debian 1.2 disk, 1/2 R1:6", U.Host);
 equals("/debian/", U.Path);

 URI U("cdrom:Foo Bar Cow/debian/");
 …
 equals("Foo Bar Cow", U.Host);
 equals("/debian/", U.Path);

Same issue as the first.  RFC 3986 permits protocol-specific path
handling, I believe there should be few problems to:

- define cdrom URIs to be "cdrom:CDROM_NAME/PATH/TO/DIST" where the Path
  is "CDROM_NAME/PATH/TO/DIST" and CDROM_NAME the first segment, which
  can optionally use the "[]" quoting and contain a trailing ":" that
  will be ignored (c.f. first example above);

- this means that "[]" quoting in the Host will no longer be misused to
  escape certain CDROM_NAMES — more RFC compliance and avoid potential
  issues with future official URI and IP standards;

- update apt-cdrom and others as appropriate to expect CDROM_NAME in
  Path rather than Host while still supporting the current behaviour of
  CDROM_NAME in Host (for cases where the correct URI syntax has been
  used);

- fix URI parsing to be RFC compliant;

- at some point in the future, deprecate the legacy CDROM_NAME in Host
  support.

Pending however the outcome of the gzip usage (second example) above.  I
also believe that a thorough examination of the parse algorithm will
expose other issues.

Note also that the URI string operator does not produce results that
match the input string in uri_test.cc for all of the cases I mentioned
above.  Attached patch demonstrates this and clears up the obvious typo
in the ftp case.

Even though this is a utility class with limited use (at the moment),
correct URI handling is important so that the class does not mislead
about its standards conformance, perhaps introducing subtle errors with
future uses.  I am prepared to work carefully on this RFC conformance
and resolve any bugs that arise from such work.

Regards
=== modified file 'test/libapt/uri_test.cc'
--- test/libapt/uri_test.cc	2011-08-15 16:22:44 +0000
+++ test/libapt/uri_test.cc	2013-03-20 07:42:35 +0000
@@ -2,111 +2,107 @@
 
 #include "assert.h"
 
+#define URI_EQUALS(str, access, user, password, port, host, path) do { \
+	URI U(str); \
+	equals(access, U.Access); \
+	equals(user, U.User); \
+	equals(password, U.Password); \
+	equals(port, U.Port); \
+	equals(host, U.Host); \
+	equals(path, U.Path); \
+	equals(str, (std::string) U); \
+	} while (0)
+
 int main() {
 	// Basic stuff
-	{
-	URI U("http://www.debian.org:90/temp/test";);
-	equals("http", U.Access);
-	equals("", U.User);
-	equals("", U.Password);
-	equals(90, U.Port);
-	equals("www.debian.org", U.Host);
-	equals("/temp/test", U.Path);
-	} {
-	URI U("http://jgg:foo@ualberta.ca/blah";);
-	equals("http", U.Access);
-	equals("jgg", U.User);
-	equals("foo", U.Password);
-	equals(0, U.Port);
-	equals("ualberta.ca", U.Host);
-	equals("/blah", U.Path);
-	} {
-	URI U("file:/usr/bin/foo");
-	equals("file", U.Access);
-	equals("", U.User);
-	equals("", U.Password);
-	equals(0, U.Port);
-	equals("", U.Host);
-	equals("/usr/bin/foo", U.Path);
-	} {
-	URI U("cdrom:Moo Cow Rom:/debian");
-	equals("cdrom", U.Access);
-	equals("", U.User);
-	equals("", U.Password);
-	equals(0, U.Port);
-	equals("Moo Cow Rom", U.Host);
-	equals("/debian", U.Path);
-	} {
-	URI U("gzip:./bar/cow");
-	equals("gzip", U.Access);
-	equals("", U.User);
-	equals("", U.Password);
-	equals(0, U.Port);
-	equals(".", U.Host);
-	equals("/bar/cow", U.Path);
-	} {
-	URI U("ftp:ftp.fr.debian.org/debian/pool/main/x/xtel/xtel_3.2.1-15_i386.deb");
-	equals("ftp", U.Access);
-	equals("", U.User);
-	equals("", U.Password);
-	equals(0, U.Port);
-	equals("ftp.fr.debian.org", U.Host);
-	equals("/debian/pool/main/x/xtel/xtel_3.2.1-15_i386.deb", U.Path);
-	}
+	URI_EQUALS("http://www.debian.org:90/temp/test";,
+		   "http",
+		   "",
+		   "",
+		   90,
+		   "www.debian.org",
+		   "/temp/test");
+	URI_EQUALS("http://jgg:foo@ualberta.ca/blah";,
+		   "http",
+		   "jgg",
+		   "foo",
+		   0,
+		   "ualberta.ca",
+		   "/blah");
+	URI_EQUALS("file:/usr/bin/foo",
+		   "file",
+		   "",
+		   "",
+		   0,
+		   "",
+		   "/usr/bin/foo");
+	URI_EQUALS("cdrom:Moo Cow Rom:/debian",
+		   "cdrom",
+		   "",
+		   "",
+		   0,
+		   "Moo Cow Rom",
+		   "/debian");
+	URI_EQUALS("gzip:./bar/cow",
+		   "gzip",
+		   "",
+		   "",
+		   0,
+		   ".",
+		   "/bar/cow");
+	URI_EQUALS("ftp://ftp.fr.debian.org/debian/pool/main/x/xtel/xtel_3.2.1-15_i386.deb";,
+		   "ftp",
+		   "",
+		   "",
+		   0,
+		   "ftp.fr.debian.org",
+		   "/debian/pool/main/x/xtel/xtel_3.2.1-15_i386.deb");
 
 	// RFC 2732 stuff
-	{
-	URI U("http://[1080::8:800:200C:417A]/foo";);
-	equals("http", U.Access);
-	equals("", U.User);
-	equals("", U.Password);
-	equals(0, U.Port);
-	equals("1080::8:800:200C:417A", U.Host);
-	equals("/foo", U.Path);
-	} {
-	URI U("http://[::FFFF:129.144.52.38]:80/index.html";);
-	equals("http", U.Access);
-	equals("", U.User);
-	equals("", U.Password);
-	equals(80, U.Port);
-	equals("::FFFF:129.144.52.38", U.Host);
-	equals("/index.html", U.Path);
-	} {
-	URI U("http://[::FFFF:129.144.52.38:]:80/index.html";);
-	equals("http", U.Access);
-	equals("", U.User);
-	equals("", U.Password);
-	equals(80, U.Port);
-	equals("::FFFF:129.144.52.38:", U.Host);
-	equals("/index.html", U.Path);
-	} {
-	URI U("http://[::FFFF:129.144.52.38:]/index.html";);
-	equals("http", U.Access);
-	equals("", U.User);
-	equals("", U.Password);
-	equals(0, U.Port);
-	equals("::FFFF:129.144.52.38:", U.Host);
-	equals("/index.html", U.Path);
-	}
+	URI_EQUALS("http://[1080::8:800:200C:417A]/foo";,
+		   "http",
+		   "",
+		   "",
+		   0,
+		   "1080::8:800:200C:417A",
+		   "/foo");
+	URI_EQUALS("http://[::FFFF:129.144.52.38]:80/index.html";,
+		   "http",
+		   "",
+		   "",
+		   80,
+		   "::FFFF:129.144.52.38",
+		   "/index.html");
+	URI_EQUALS("http://[::FFFF:129.144.52.38:]:80/index.html";,
+		   "http",
+		   "",
+		   "",
+		   80,
+		   "::FFFF:129.144.52.38:",
+		   "/index.html");
+	URI_EQUALS("http://[::FFFF:129.144.52.38:]/index.html";,
+		   "http",
+		   "",
+		   "",
+		   0,
+		   "::FFFF:129.144.52.38:",
+		   "/index.html");
 	/* My Evil Corruption of RFC 2732 to handle CDROM names! Fun for
 	   the whole family! */
-	{
-	URI U("cdrom:[The Debian 1.2 disk, 1/2 R1:6]/debian/");
-	equals("cdrom", U.Access);
-	equals("", U.User);
-	equals("", U.Password);
-	equals(0, U.Port);
-	equals("The Debian 1.2 disk, 1/2 R1:6", U.Host);
-	equals("/debian/", U.Path);
-	} {
-	URI U("cdrom:Foo Bar Cow/debian/");
-	equals("cdrom", U.Access);
-	equals("", U.User);
-	equals("", U.Password);
-	equals(0, U.Port);
-	equals("Foo Bar Cow", U.Host);
-	equals("/debian/", U.Path);
-	}
+	URI_EQUALS("cdrom:[The Debian 1.2 disk, 1/2 R1:6]/debian/",
+		   "cdrom",
+		   "",
+		   "",
+		   0,
+		   "The Debian 1.2 disk, 1/2 R1:6",
+		   "/debian/");
+	URI_EQUALS("cdrom:Foo Bar Cow/debian/",
+		   "cdrom",
+		   "",
+		   "",
+		   0,
+		   "Foo Bar Cow",
+		   "/debian/");
 
 	return 0;
 }


Reply to: