Re: Cross-directory hard links in Debian packages
On Fri, Nov 15, 2013 at 01:50:05PM +0000, Jonathan Dowland wrote:
> I'm not sure that making a general rule based on an edge-case is a
> good idea. Publican is not very popular at all, it's quite likely
> that none of the 70 or so people who have installed it have done
> anything unusual with mounts around /usr.
publican is just an example. You can find more packages employing the
same technique at
http://lintian.debian.org/tags/package-contains-hardlink.html.
But we should not only look at packages doing this, but packages that
are wasting precious mirror and disk space[1]:
binary package #files #bytes
wims-extra-all 11057 44415092
mixxx-data 7302 8055125
widelands-data 6692 12953306
code-aster-test 3225 59938595
sofia-sip-doc 3146 6848743
mailman 1745 2007439
texlive-lang-cjk 1619 4986872
spikeproxy 1602 5934959
acl2-doc 1598 7209512
freefoam-dev-doc 1495 3145120
wims 1458 2125970
triplea 1340 8641063
libqt4-dev 1337 5003042
libboost1.54-doc 1240 4131392
libgrib-api-1.10.4 1210 1678922
lazarus-doc-1.0.10 1174 10734571
python-matplotlib-doc 1172 24691971
fonts-mathjax-extras 1136 141683
libboost1.53-doc 1097 3717938
dotlrn 1096 5046637
libboost1.49-doc 1091 3578000
gnat-4.4 1083 10643007
openclipart2-libreoffice 1046 2142208
sql-ledger 1041 9248930
esys-particle 1025 8243181
typo3-src-4.5 1019 1528729
texlive-fonts-extra 998 4687576
moodle 959 6392249
openbox-themes 926 200312
xfwm4-themes 890 412192
grass-dev-doc 832 1124116
phpbb3-l10n 825 623634
fillets-ng-data 818 2712929
tuxpaint-stamps-default 813 2824876
optgeo 793 2681882
libbcel-java-doc 760 17640174
publican 750 5283082
msp430mcu 737 14475576
freegish-data 691 1252457
collabtive 687 1419645
fp-docs-2.6.2 683 2111629
libmapi-dev 681 31188
libnb-platform13-java-doc 678 1349378
murrine-themes 656 255650
ctpp2-doc 642 699880
fvwm-crystal 634 800295
pacemaker-dev 628 1399352
libknopflerfish-osgi-java-doc 598 4134711
libreoffice-dmaths 588 905010
freefoam-user-doc 587 883850
The numbers above are the achievable savings by using links. A few of
those files will not be hard linkable for crossing popular file system
boundaries. Still the projected savings are significant. Clearly, a
generic solution is desirable. If you are interested in details on the
savings of a particular package, visit
http://dedup.debian.net/compare/<package>/<package>. Roughly every 25th
file in the archive is duplicated within the same package. That's almost
1% of the uncompressed archive size.
> Looking at publican a number of questions occur to me
>
> * why hardlink all of the contents of
> /usr/share/doc/publican/Users_Guide/desktop/$LOCALE/Common_Content
> together rather than symlink them to some common directory like
> /usr/share/publican/Common_Content? Is it because there might be
> additions or omissions across locales?
Because it is more work to do so. One of the big advantages of using
hard links is that you don't have to choose a "primary location". These
hard links are generated at package build time.
> * Can/should that not be handled within the tool itself (implement
> a multi-directory lookup process)
Again this is more work. It might be possible in the case of publican,
but if you look at the list above, you'll quickly notice that this
approach doesn't scale.
Is there any technical reason for rejecting the usage of hard links in
binary packages besides common file system boundaries?
In any case clarifying and documenting whether cross-directory hard
links are a tool to be used seems worthwhile to me.
* Either they are to be avoided at all costs, then we have a hand full
of violations to be fixed,
* or they are a tool that can be used to significantly shrink mirror
and installation size at very little effort.
Helmut
[1] ssh delfin.debian.org sqlite3 /srv/dedup.debian.org/dedup.sqlite3
'"SELECT package.name, sharing.files, sharing.size FROM package JOIN
sharing JOIN function WHERE sharing.pid1 = package.id AND
sharing.pid2 = package.id AND sharing.fid1 = function.id AND
sharing.fid2 = function.id AND function.name = \"sha512\" ORDER BY
sharing.files DESC LIMIT 50;"'
Reply to: