[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: dpkg and hardlinks



Mike Hommey wrote:
> On Tue, Mar 24, 2009 at 02:34:09PM +0100, Jerome Warnier <jwarnier@beeznest.net> wrote:
>   
>> Giacomo A. Catenazzi wrote:
>>     
>>> Jerome Warnier wrote:
>>>       
>>>> Raphael Hertzog wrote:
>>>>         
>>>>> On Tue, 24 Mar 2009, Jerome Warnier wrote:
>>>>>  
>>>>>           
>>>>>> For files from packages, though, deduplication might be a good
>>>>>> idea, as
>>>>>> dpkg is supposedly the only one to ever modify the files (under
>>>>>> /usr for
>>>>>> example).
>>>>>> I don't know however how dpkg treats hardlinks. Does it "break" the
>>>>>> hardlink before replacing a file or does it replace the file whatever
>>>>>> its real nature is?
>>>>>>     
>>>>>>             
>>>>> IIRC dpkg preserves hardlinks inside a binary package but I don't
>>>>> see how
>>>>> it could do the same across multiple binary packages.
>>>>>   
>>>>>           
>>>> Oh, I didn't expect it to. I just wanted to know its behaviour when it
>>>> upgrades a package.
>>>> Before the upgrade, the file is a hardlink (because I hardlinked it
>>>> manually), then it tries to upgrade the file/hardlink. Does it "break"
>>>> the hardlink* before upgrading the file or does it overwrite the
>>>> file/hardlink and all of its "siblings"?
>>>>         
>>> Do you really care? (not theoretically, but in normal use).
>>> I would expect that same content will be delivered:
>>> - by "brother" packages (same source), thus usually updated
>>>   at the same time.
>>> - in documentation (so maybe not so important for your use).
>>>
>>> I think the most problem are in files outside "dpkg" control,
>>> i.e. /var and /etc.
>>>
>>> I'm just curious: do you have a list of "same" content files?
>>> maybe I'm completely wrong.
>>>       
>> Here you are, for /usr on a typical Lenny AMD64 server (generated with
>> "finddup -n" from package perforate):
>> http://glouglou.beeznest.org/~jwarnier/usr-duplicates.list.gz
>>     
>
> $ zcat usr-duplicates.list.gz | awk '{t+=$1*(NF-2)}END{print t}'
> 33142129
>
> You would free 33MB. How big is your disk ? Is it worth bothering ?
>   
I'm not an awk god, but isn't that supposed to just be the total size of
the files it could take if deduplicated?
In this case, it is not the size I would reclaim, as there are sometimes
up to 4 copies of the same content.
> You can get much more free space than that by reducing the number of inodes
> supported by your filesystem:
> For instance, on my / fs, that contains /usr, and is only 3GB:
> Inode count:              384000
> Free inodes:              314133
>
> I will obviously never use that many inodes... Now, consider an inode
> is 128 bytes (or even 256 in some cases), and do some maths...
>
> Mike
>   


Reply to: