Re: an idea for next generation APT archive caching
>>>>> "martin" == martin f krafft <madduck@debian.org> writes:
martin> 1. apt-proxy: While I love the concept of apt-proxy, it
martin> works very unreliably. Frequently, the proxy fails to
martin> download the package or imposes very long delays (#272217,
martin> and others).
apt-proxy is generally good. I use it all the time here, and my
current ISP uses it to (which means I can download Debian packages
without it counting towards my usage quota!).
It has some limitations though (sorry, I haven't read the bug reports;
I am not sure if these are reported). I don't consider these show
stoppers, but they can be irritating at times.
* Sometimes if all servers return file not found (e.g. because the
package is obsolete and has been deleted), the file not found is not
passed on to the client. Instead the client times out. Sometimes. When
I test it out in order to write bug reports, it works fine. Arrghh!
* Can't cope very well if server fails, and it will repeatedly try to
contact the dead server for every request, meaning the client is likely
to timeout even if the 2nd server listed is OK.
* If the above point wasn't bad enough by itself, the apt-proxy binary has
hard coded:
WGET_CMD="$WGET --timestamping --no-host-directories --tries=5 --no-directories -P $DL_DESTDIR"
So if the client doesn't time out after the first try, it will timeout
after 5 more times. This is insane! I hacked my binary to remove this
parameter. Also the 30 second timeout seems a while, but at least this
is configurable (and may actually be good if the upstream server is
running apt-proxy, in case the upstream server encounters delays).
* No thought put into the file deletion algorithm. IMHO, deleting
files based on age is wrong (consider how long stable files
last). Deleting files based on number of different copies is also
wrong (consider if you have some systems setup with stable and another
is unstable). IMHO, the only correct way is to scan the most recently
downloaded Packages and Source index files and delete files that
aren't mentioned anymore. This could be made more aggressive though if
disk space is low.
* Can't cope with out of disk space errors, it will keep trying to
download regardless, giving the client time out errors.
* Previously, due to a bug in squid, if you told apt-proxy to go via
squid, it would never get updated, as squid always returned the cached
HTTP headers which said the file hadn't changed. This in turn
indicated to wget not to download the new file. I filed a bug report
on this, and it was closed, so I assume the issue has been fixed (not
tested myself).
I have looked at (and flamed) apt-proxy in particular, but I suspect
at least some of the issues here might also be relevant to other
caching packages.
If you want a reliable caching service, I think some thought needs to
be put into some of the issues above. Some issues might be easy to
fix, others might be harder (e.g. minimizing latency so the client
doesn't time out and to minimize download time but choosing the best
server at the same time).
martin> If it does work, it's a performance hog. On my Opteron
martin> 3600+, my mouse starts to go jaggy when more than one
martin> machine accesses the cache at the same time.
Strange. I guess I only have used one machine at a time. I haven't
heard anyone complain of similar problems before though.
martin> I have always missed the ability to surf the local
martin> repository.
You mean via HTTP? This would be possible to add, I think. I guess it
hasn't been considered a priority.
--
Brian May <bam@debian.org>
Reply to: