[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Faster importing of packages (ddtp.debian.org)


I'm slowly making progress on the ddtp.debian.org. After being told
that the old import process would not be acceptable on the new setup
(due to the error logs it produces) I've been testing the new script I
wrote. It's currently faster, though it takes about 5.5 hours to do
what the old script did in 10, while it's importing 15 architectures
instead of just 11. But it also does less, especially with respect to

This is still a long time, so I had some ideas about how to improve this:

1. Not every architecture every day. How often does it happen that a
package only exists on one architecture? Does it matter if some
architectures are only imported once a week?

The Packages files are all different sizes so there must be a
difference, but is it significant from the point of view of

2. Only import changes. Each day there are Packages.diff files
produced with just the changes from the previous day. In theory you
could use this to just import the packages that have changed. Problem
is I can't find much information about how that actually works. It
looks like an ed-style diff, but I'm not sure.

A side-effect would be that the "description was in distribution X"
timestamps would no longer be wholly accurate. Would need to deal with
this some other way.

What I'm really looking for is a script which takes the diff-files
with respect to yesterday and today's Packages file and lists just the
packages which have changed. Does such a script exist?

3. Non-free is not imported. I seem to remember there being a reason
for this, but it's not clearly documented in any case. Does anyone
remember why? I imagine due to people not wanting to translate
non-free software in general, but that raises the question about

Any ideas?
Martijn van Oosterhout <kleptog@gmail.com> http://svana.org/kleptog/

Reply to: