[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: dpkg: The Installed-Size estimate can be wrong by a factor of 8 or a difference of 100MB



Hi!

On Thu, 2015-01-08 at 03:07:50 +0100, Guillem Jover wrote:
> Ok, I checked yum and rpm, and the disk space and inode usage delta
> is tracked in rpm itself. It just adds a "problem" to the transaction
> in case either of the above are depleted, which are reported on
> install or erasures. It does the tracking per file as it has that
> access itself, and compares against the different traversed mount
> points, so it's more precise than Installed-Size (which is something
> I also had in mind possibly doing in dpkg once it tracks the files
> metadata, to report it back to frontends).

I see this might seem to contradict what I said previously, that this
might not be worth it at all. My point is that *iff* we wanted more
detailed per-file vs mount-point information, then that should be done
in dpkg itself as it's the one with such access. But that information
will keep being just an approximation, maybe a slightly better one, but
still an approximation.


In any case, I've implemented now the change I mentioned in a previous
mail, and here's the patch I'm planning on pushing to dpkg. Which I
consider "solves" both reported problems: reproducibility and documenting
the current limitations of the field.

(The directories could probably get a better size accounting by
accumulating the shipped filename sizes within, but I don't think
that's worth the trouble either, and it makes the algorithm more
complicated for third-parties to implement.)

Thanks,
Guillem
From 775fd2a0876e0c778d78451d2ef9268247e61d24 Mon Sep 17 00:00:00 2001
From: Guillem Jover <guillem@debian.org>
Date: Tue, 20 Jan 2015 02:37:20 +0100
Subject: [PATCH] dpkg-gencontrol: Rework Installed-Size field default value
 computation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Switch from «du» to File::Find, and accumulate size usage per filesystem
object, on 1 KiB units. Use the actual size only for regular files and
symlinks, and just 1 KiB for any other filesystem object type.

This guarantees a constant and reproducible size regardless of the
build system filesystem being used.

Closes: #650077
---
 man/deb-substvars.5        | 19 ++++++++++++-------
 scripts/dpkg-gencontrol.pl | 43 ++++++++++++++++++++++---------------------
 2 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/man/deb-substvars.5 b/man/deb-substvars.5
index f507042..376929f 100644
--- a/man/deb-substvars.5
+++ b/man/deb-substvars.5
@@ -18,7 +18,7 @@
 .\" You should have received a copy of the GNU General Public License
 .\" along with this program.  If not, see <https://www.gnu.org/licenses/>.
 .
-.TH deb\-substvars 5 "2009-07-15" "Debian Project" "dpkg utilities"
+.TH deb\-substvars 5 "2015-01-20" "Debian Project" "dpkg utilities"
 .SH NAME
 deb\-substvars \- Debian source substitution variables
 .
@@ -111,13 +111,18 @@ The source package version (from the changelog file). This variable is now
 the \fBsource:Version\fP or \fBbinary:Version\fP as appropriate.
 .TP
 .B Installed\-Size
-The total size of the package's installed files. This value is copied
-into the corresponding control file field; setting it will modify the
-value of that field. If this variable isn't set
+The approximate total size of the package's installed files. This value is
+copied into the corresponding control file field; setting it will modify
+the value of that field. If this variable is not set
 .B dpkg\-gencontrol
-will use
-.B du \-k debian/tmp
-to find the default value.
+will compute the default value by accumulating the size of each regular
+file and symlink rounded to 1 KiB used units, and a baseline of 1 KiB for
+any other filesystem object type.
+
+\fBNote:\fP Take into account that this can only ever be an approximation,
+as the actual size used on the installed system will depend greatly on the
+filesystem used and its parameters, which might end up using either more
+or less space than the specified in this field.
 .TP
 .B Extra\-Size
 Additional disk space used when the package is installed. If this
diff --git a/scripts/dpkg-gencontrol.pl b/scripts/dpkg-gencontrol.pl
index 09a6956..6b076f4 100755
--- a/scripts/dpkg-gencontrol.pl
+++ b/scripts/dpkg-gencontrol.pl
@@ -4,7 +4,7 @@
 #
 # Copyright © 1996 Ian Jackson
 # Copyright © 2000,2002 Wichert Akkerman
-# Copyright © 2006-2014 Guillem Jover <guillem@debian.org>
+# Copyright © 2006-2015 Guillem Jover <guillem@debian.org>
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -23,6 +23,8 @@ use strict;
 use warnings;
 
 use POSIX qw(:errno_h :fcntl_h);
+use File::Find;
+
 use Dpkg ();
 use Dpkg::Gettext;
 use Dpkg::ErrorHandling;
@@ -334,26 +336,25 @@ if ($oppackage ne $sourcepackage || $verdiff) {
 }
 
 if (!defined($substvars->get('Installed-Size'))) {
-    my $c = open(my $du_fh, '-|');
-    if (not defined $c) {
-        syserr(g_('cannot fork for %s'), 'du');
-    }
-    if (!$c) {
-        chdir("$packagebuilddir")
-            or syserr(g_("chdir for du to \`%s'"), $packagebuilddir);
-        exec('du', '-k', '-s', '--apparent-size', '.')
-            or syserr(g_('unable to execute %s'), 'du');
-    }
-    my $duo = '';
-    while (<$du_fh>) {
-	$duo .= $_;
-    }
-    close($du_fh);
-    subprocerr(g_("du in \`%s'"), $packagebuilddir) if $?;
-    if ($duo !~ m/^(\d+)\s+\.$/) {
-        error(g_("du gave unexpected output \`%s'"), $duo);
-    }
-    $substvars->set_as_auto('Installed-Size', $1);
+    my $installed_size = 0;
+    my $scan_installed_size = sub {
+        lstat or syserr(g_('cannot stat %s'), $File::Find::name);
+
+        if (-f _ or -l _) {
+            # For filesystem objects with actual content accumulate the size
+            # in 1 KiB units.
+            $installed_size += POSIX::ceil((-s _) / 1024);
+        } else {
+            # For other filesystem objects assume a minimum 1 KiB baseline,
+            # as directories are shared resources between packages, and other
+            # object types are mainly metadata-only, supposedly consuming
+            # at most an inode.
+            $installed_size += 1;
+        }
+    };
+    find($scan_installed_size, $packagebuilddir);
+
+    $substvars->set_as_auto('Installed-Size', $installed_size);
 }
 if (defined($substvars->get('Extra-Size'))) {
     my $size = $substvars->get('Extra-Size') + $substvars->get('Installed-Size');
-- 
2.2.1.209.g41e5f3a


Reply to: