[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: These new diffs are great, but...




"Bastian Venthur" <expires-2007@venthur.de> wrote in message [🔎] e81grk$dbt$1@sea.gmane.org">news:[🔎] e81grk$dbt$1@sea.gmane.org...
Robert Lemmen wrote:

standard method. you will however find out that the size of all diffs
together is already less than the size of the regular packages file.

Yeah, looking at the average filesize of a diff compared to a packages
file, I guess you'll need to wait like 100-200 days until the sum of the
diffs becomes larger than the package file itself. However, downloading
a 5meg file takes a few seconds on my boxes while downloading the diffs
from 10-20days can take a few minutes, which is not very attractive.

This is quite a dilemma since I understand that the bandwith of
volunteering archive mirrors is not free.

Since the main problem seems to be that downloading many small files can
take much longer than downloading one big file, a compromise could be to
provide only one diff. The trick: generate x diffs for:

today-1day, today-2days .. today-x days

so you only have do download one file if your last update is less than x
days ago.

A good compromise for x could be 50 days or something. The diffs would
be reasonable small, fast to download and if your last update is more
than x days ago you still could download the package file directly.

This solution should keep the bandwidth utilization on the servers small
(older diffs are less likely to be downloaded than the most recent ones)
while being faster than the current (and even faster than downloading
the whole packages) solution.

Plus, you don't have to keep all the old diffs (only the last x ones) on
the servers.


Any ideas?


A very good idea. This is trading a slight increase in file space for bandwidth and speed. There is some additional server-side processing required, but diffing is realatively cheap.

If reversible diffs are used then generating today's diffs requires only yesteday's Package file, the most recent (x-1) diffs from yesterday,and todays package file. Scripting a program to update the diffs would not be terribly hard. Once the diffs are updated, everything from yesterday can be discarded.

Apt would always download the main package file if it was smaller than the appropriate diff. If it turns out that some of the diffs (the ones around today-x) are pretty large they can be compressed like the main package file.


Regardless, diffs should obviously not be used for file:// Sources or cd-rom sources unless the user explicitly says otherwise. This is because it is normally faster to fetch the main file when using those sources.





Reply to: