[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bootstrapping: list of 81 self-cycles in Debian Sid

On 13142 March 1977, Johannes Schauer wrote:

>> That obviously depends a bit on what is actually needed to run (and then on
>> talking to DSA, but they don't bite so much :) ).
> If it is found useful, then I have to figure out who to contact about
> this.

For it to live on debian.org seperately , thats DSA,
debian-admin@lists.debian.org, until it goes into the umbrella of an
existing team, then that will do that for you.

>> I guess you need a mirror (or at least packages/sources files) as input,
>> though you might want to check if you can use an existing database within
>> Debian to just use the already exsiting data.
> Yes, the input to the code is just a pair of Packages.bz2 and Sources.bz2
> files.

And as they are generated completly out of our archives postgres
database, that one could be used too, probably not hard to change. I
wonder if one could "offload" a bit of the work to sql too to help.

> For the output you see in the link above, they total to a size of 500MB. If
> they can be retrieve directly from somewhere on the same machine, then it would
> naturally save lots of space.

500MB isnt really much space. And as they are mostly for the
Packages/Sources, its much less for the output you generate...  That is,
ideally this generates just "index" files, which are then consumed by
something like the PTS.

> RAM:
> The highest amount I observed was 260 MB of used resident memory. This value is
> that high because the build dependency graph and the strong dependency graph of
> the whole distribution has to be kept in memory at the same time.

Not much.

> CPU:
> The whole script producing the output above took 7 hours to run on a 2.5GHz
> Core i5 for all suites and all architectures (38 combinations). This is because
> generating strong strong dependencies for all packages in the archive takes
> 8-10 minutes with current archive sizes.  I dont think this value can
> considerably be lowered. On the other hand, the cron job doesnt have to be run
> every day but maybe once a week or once a month?

Thats the only interesting part, but if one does it in background, once
a week, properly in parallel, it shouldn't be too bad.

Also, would "incremental" runs work? Say, the database tells you which
packages changed recently due to uploads. Only recheck the parts
affected by it.
Yes, requires state storage.

And it sounds like something that could be done using the archives tools
/ integrated into them.
If you are interested to integrate it there properly, we are in
#debian-ftp on irc.debian.org and also debian-dak@lists.debian.org

bye, Joerg
Dad, you've done a lot of great things, but you're a very old man, and
old people are useless.

Reply to: