[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Proposal: convert all udebs to xz compression



Hi there,

last week I played around with xz compression and evaluated how d-i
could benefit from it.  It turns out that we could go from a total udeb
size (on amd64) of 50 MB to 40 MB just by applying xz -0e compression
instead of the default gzip.

To achieve that you need UNXZ support in busybox, which gives you two
more pages of code (8k) and 700 bytes more in udpkg (both measured on
amd64).  So we could neglect this change completely.

I went on and instrumented the unpack calls within d-i by calling GNU
time(1).  With gunzip (I made it call the applet instead of relying on
seamless tar support, which isn't yet implemented for xz in busybox) I
get a maximum RSS of 2384 kbytes, assuming that time measures that
correctly.  With xz -0 compression I get a maximum RSS of 2608 kbytes.

With the default xz compression of -6 I get a maximum RSS of 11040
kbytes on the largest udebs, as the dictionary is increased.  But I only
save less than a megabyte on all udebs, comparing -6e (39484 kbytes) and
-0e (40439 kbytes).  So there's absolutely no benefit in activating a
higher compression ratio.  For those wondering about the -e bit: packing
udebs gets more costly through this, but its result is more efficient
(43697 kbytes for -0 vs. 40439 kbytes for -0e) while not imposing any
more load onto the unpacker.  And we generally don't care about udebs
taking longer to compress given their tinyness and the load being spread
over many package builds anyway.

On my laptop the increased CPU load of unxz was barely noticeable.  Some
more calls had a CPU time of > 0.00 (i.e. with unxz), but it doesn't
really matter.  See [1], [2] and [3] for syslogs with instrumentation.

My patch to udpkg is attached.  It's not the best C code, but udpkg is
full of tiny static buffers.  I'd appreciate a review.

So my proposal is to switch the udeb compression default in dpkg to xz
for wheezy, when the busybox and udpkg changes have landed.  Then most
udebs will get a translation upload anyway, if not they can be binNMUed
to pick up the right compression.

There doesn't seem to be a con at first glance, but if somebody sees
one: please speak up.  At least it's untrue that xz will have any
measureable impact when used with -0e, also for embedded systems.  If
you're doing a network install it might even be faster, because there
will be less fetching involved.  net-retriever might even need less RAM.

Kind regards,
Philipp Kern

[1] http://people.debian.org/~pkern/syslog-gunzip-time.gz
[2] http://people.debian.org/~pkern/syslog-unxz-time.gz (xz -6)
[3] http://people.debian.org/~pkern/syslog-unxz-0-time.gz
-- 
 .''`.  Philipp Kern                        Debian Developer
: :' :  http://philkern.de                         Stable Release Manager
`. `'   xmpp:phil@0x539.de                         Wanna-Build Admin
  `-    finger pkern/key@db.debian.org
From 8c2870debf4c5bc76ff78b7e76a39da345ea5f7a Mon Sep 17 00:00:00 2001
From: Philipp Kern <pkern@debian.org>
Date: Mon, 17 Oct 2011 17:33:17 +0200
Subject: [PATCH] Implement xz support.

---
 debian/changelog |    6 +++++
 udpkg.c          |   57 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/debian/changelog b/debian/changelog
index d68a836..a21ca32 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+udpkg (1.13) UNRELEASED; urgency=low
+
+  * Implement support for udebs compressed with xz.
+
+ -- Philipp Kern <pkern@debian.org>  Mon, 17 Oct 2011 17:01:07 +0200
+
 udpkg (1.12) unstable; urgency=low
 
   * Redesign read_block interface, fixing crashes caused by memory leak fix
diff --git a/udpkg.c b/udpkg.c
index 4b35bc8..f684eb7 100644
--- a/udpkg.c
+++ b/udpkg.c
@@ -132,16 +132,63 @@ static int dpkg_dounpack(struct package_t *pkg)
 		"templates", "menutest", "isinstallable",
 		"config"
 	};
+	FILE *infp = NULL;
+	const char *compression_type = NULL;
+	const char *decompression_tool;
 #ifdef DOREMOVE
 	char *p;
-	FILE *infp = NULL, *outfp = NULL;
+	FILE *outfp = NULL;
 #endif
 
 	DPRINTF("Unpacking %s\n", pkg->package);
 
 	cwd = getcwd(0, 0);
 	chdir("/");
-	snprintf(buf, sizeof(buf), "ar -p %s data.tar.gz|tar -xzf -", pkg->file);
+
+	snprintf(buf, sizeof(buf), "ar -t %s", pkg->file);
+	if ((infp = popen(buf, "r")) == NULL)
+	{
+		FPRINTF(stderr, "Cannot retrieve archive members of %s: %s\n",
+			pkg->file, strerror(errno));
+		r = 1;
+		goto reset_cwd;
+	}
+
+	while (fgets(buf, sizeof(buf), infp)) {
+		if (strncmp(buf, "data.tar.", 9) == 0) {
+			compression_type = buf + 9;
+			break;
+		}
+	}
+	pclose(infp);
+
+	if (compression_type == NULL) {
+		FPRINTF(stderr, "No data member found in %s\n", pkg->file);
+		r = 1;
+		goto reset_cwd;
+	}
+
+	if (strcmp(compression_type, "gz\n") == 0)
+	{
+		compression_type = "gz";
+		decompression_tool = "gunzip";
+	}
+	else if (strcmp(compression_type, "xz\n") == 0)
+	{
+		compression_type = "xz";
+		decompression_tool = "unxz";
+	}
+	else
+	{
+		FPRINTF(stderr, "Invalid compression type for data member of %s\n",
+			pkg->file);
+		r = 1;
+		goto reset_cwd;
+	}
+
+	snprintf(buf, sizeof(buf), "ar -p %s data.tar.%s|%s -c|tar -x",
+		pkg->file, compression_type, decompression_tool);
+	puts(buf);
 	if ((r = di_exec_shell_log(buf)) == 0)
 	{
 		/* Installs the package scripts into the info directory */
@@ -204,8 +251,8 @@ static int dpkg_dounpack(struct package_t *pkg)
 		 * so oddly...
 		 */
 		snprintf(buf, sizeof(buf),
-			"ar -p %s data.tar.gz|tar -tzf -",
-			pkg->file);
+			"ar -p %s data.tar.%s|%s -c|tar -t",
+			pkg->file, compression_type, decompression_tool);
 		snprintf(buf2, sizeof(buf2),
 			"%s%s.list", INFODIR, pkg->package);
 		if ((infp = popen(buf, "r")) == NULL ||
@@ -250,6 +297,8 @@ static int dpkg_dounpack(struct package_t *pkg)
 	}
 	else
 		FPRINTF(stderr, "%s exited with status %d\n", buf, r);
+
+reset_cwd:
 	chdir(cwd);
 	return r;
 }
-- 
1.7.6.3

Attachment: signature.asc
Description: Digital signature


Reply to: