Delta debs repository layout v2
This is an advanced proposal for integrating deltas into the archive. It has
been designed after some feedback and some more research.
* We don't want to have Deltas indexes like Packages files
-> we are unlikely to need most of the deltas
-> about same size as Packages file (when both compressed)
* We don't want to try to fetch deltas for each upgrade. We'll only be generating
deltas for about 2/3rd of all upgrades (as far as current results tell us.
* We don't want to have untrusted deltas/files
* We don't want to generate/verify too many signatures
* upgrade: A tuple of two .deb; an old one, and a new one.
* delta: A deb containing only ddelta files or similar instead of full files
* bloom filter: A bitarray of $n$ bytes indexed by $k$ hash functions. Has
no false negatives, but might have false positives.
This approach tries to solve the problem by introducing per-source-package
Delta indices in the pool. This solves the problem of having to download meta
info about all deltas in the archive.
An inline signed index containing, for each delta for that source package,
the following information:
* Package - the name of the package
* Old-ID - the id of the package being upgraded from
* New-ID - the id of the version being upgraded to
* Size - the size of the delta
* SHA256 - the hash of the delta
* Algorithm - algorithm used in deltas, currently ddelta
Including the algorithm is essential, so we can switch algorithms at a later
point and have old dpkg versions download a delta they understand or no delta.
Deltas are stored with a fixed pathname to avoid duplicating information:
The concrete layout of the Deltas file is unspecified so far. A minimal solution is:
8a09ba1ca92fe5405de7bd90fb06a1ce6cc502917df88f2990d3701a6cc75fa1 590012 apt_1e9bfd18a899677e70fda42f5192fd169c4a778d7c7426016d1bf0a0d9b5dbc7_1955901c3d7dd72c451274fa72f42539e573a9e883c4e05ad283e2a30a62f13b_ddelta.deltadeb
this encodes all the information in a fairly minimal way (thus we don't need any compression),
and is compatible with the existing Release file format. Note that we will know the name(s) of the file's we want
to download beforehand, we just need to figure out if they exist and get signed hashes
and sizes for them.
The package name is in fact optional, but it might be worth including it to be able
to figure out which package the delta belongs to.
It might be worth including the distribution in the filename, and call it
e.g. Deltas.bookworm-security instead of Deltas. That said, there might not be
enough releases at a time for it to be worth it. NEEDSINVESTIGATION
However, if we only produce deltas for 2/3 of the upgrades, we still have 1/3 of
false positives for which we will needlessly download Deltas indices. Depending on
the connection's latency, this might be bad. To solve this, we can introduce a new
A bloom filter of size $n$ bits. For each delta from $old_id to
$new_id, construct a delta id $old_id | $new_id (the concatenation). Split
up the delta_id into chunks of $b$ bits. Use each $chunk \mod n$ as an index
into the bloom filter.
It seems reasonable to expect us to achieve a bloom filter with a false
positive rate of about 1%. Experimental results for xenial-updates/main/amd64 have
shown a 0.75% false positive rate for a bloom filter of $n=98317$ bits
and $b=32$ bit chunks for 9132 accepted out of 13500 total deltas (that's about
12 KB for a Packages.xz of 800 KB).
This does not include the delta algorithm, hence it's probably best not to have
deltas with different algorithms for the same upgrade.
The bloom filter is signed by the InRelease file.
Delta upgrade process
Acquire Deltas.bloom for each Packages file
upgrade/install - download
for each upgrade in changes:
if upgrade in Deltas.bloom and applicable(upgrade):
queue upgrade's Deltas index
queue full deb
for each deltas index:
if downloaded and delta in index:
for each delta:
if downloaded failed:
"""Verify that an upgrade is applicable.
For an upgrade to be applicable, the files on the file system must match the files
shipped in the package. A reasonable approximation is to ensure that the file has
not been modified later than the install time and that the size is the same."""
for each file in installed_version:
if file.mtime >= installed_version.install_time or file.mtime > file.ctime or file.size != expected_size:
* ddelta should grow a CRC for data blocks it copies/diffs so it does not silently
succeed given a broken "old" file
Gotta update the wiki page with this stuff.
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer i speak de, en