[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#761117: debsources: file-level deduplication



Package: qa.debian.org
Severity: wishlist

We already have all the file checksums in the database. Removing
(file-level) duplication in the file storage, using hard-links, can be
safely implemented offline, i.e., as long as no debsources update is
ongoing.

Micro-benchmark (from my DebConf14 Debsources talk) of the expected disk
space saving:

select count(*) from checksums;                -> 35'370'653
select count(distinct sha256) from checksums;  -> 15'822'745
                                  --------------------------
                                  => deduplicated core: ~45%

Cheers.
-- 
Stefano Zacchiroli  . . . . . . .  zack@upsilon.cc . . . . o . . . o . o
Maître de conférences . . . . . http://upsilon.cc/zack . . . o . . . o o
Former Debian Project Leader  . . @zack on identi.ca . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

Attachment: signature.asc
Description: Digital signature


Reply to: