[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: dpkg and hardlinks



On Tue, Mar 24, 2009 at 03:11:17PM +0100, Jerome Warnier <jwarnier@beeznest.net> wrote:
> Mike Hommey wrote:
> > On Tue, Mar 24, 2009 at 02:34:09PM +0100, Jerome Warnier <jwarnier@beeznest.net> wrote:
> >   
> >> Giacomo A. Catenazzi wrote:
> >>     
> >>> Jerome Warnier wrote:
> >>>       
> >>>> Raphael Hertzog wrote:
> >>>>         
> >>>>> On Tue, 24 Mar 2009, Jerome Warnier wrote:
> >>>>>  
> >>>>>           
> >>>>>> For files from packages, though, deduplication might be a good
> >>>>>> idea, as
> >>>>>> dpkg is supposedly the only one to ever modify the files (under
> >>>>>> /usr for
> >>>>>> example).
> >>>>>> I don't know however how dpkg treats hardlinks. Does it "break" the
> >>>>>> hardlink before replacing a file or does it replace the file whatever
> >>>>>> its real nature is?
> >>>>>>     
> >>>>>>             
> >>>>> IIRC dpkg preserves hardlinks inside a binary package but I don't
> >>>>> see how
> >>>>> it could do the same across multiple binary packages.
> >>>>>   
> >>>>>           
> >>>> Oh, I didn't expect it to. I just wanted to know its behaviour when it
> >>>> upgrades a package.
> >>>> Before the upgrade, the file is a hardlink (because I hardlinked it
> >>>> manually), then it tries to upgrade the file/hardlink. Does it "break"
> >>>> the hardlink* before upgrading the file or does it overwrite the
> >>>> file/hardlink and all of its "siblings"?
> >>>>         
> >>> Do you really care? (not theoretically, but in normal use).
> >>> I would expect that same content will be delivered:
> >>> - by "brother" packages (same source), thus usually updated
> >>>   at the same time.
> >>> - in documentation (so maybe not so important for your use).
> >>>
> >>> I think the most problem are in files outside "dpkg" control,
> >>> i.e. /var and /etc.
> >>>
> >>> I'm just curious: do you have a list of "same" content files?
> >>> maybe I'm completely wrong.
> >>>       
> >> Here you are, for /usr on a typical Lenny AMD64 server (generated with
> >> "finddup -n" from package perforate):
> >> http://glouglou.beeznest.org/~jwarnier/usr-duplicates.list.gz
> >>     
> >
> > $ zcat usr-duplicates.list.gz | awk '{t+=$1*(NF-2)}END{print t}'
> > 33142129
> >
> > You would free 33MB. How big is your disk ? Is it worth bothering ?
> >   
> I'm not an awk god, but isn't that supposed to just be the total size of
> the files it could take if deduplicated?
> In this case, it is not the size I would reclaim, as there are sometimes
> up to 4 copies of the same content.

the "*(NF-2)" part takes care of those copies.

Mike


Reply to: