[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

FTPTeam: Programming fun task available



Heyho,

as we have many tasks on our plate but not nearly enough time nor people
to do them all, let me try something and ask you, the project members,
to help out. No, I don't want money, though I wouldn't say no to it, of
course. Better yet. I have work to give away... And this task has the
nicety of not requiring any special privileges to be carried out, so
whoever out there wants to help, no matter DD/DM/interested user, as
long as you have time and know Python well, speak up. :)


One of the tasks I outlined in my meeting minutes after the ftp-master
meeting at [1] is number 13. "dinstall replacement". Even though it is
at place 13, it is actually important to get done soon, and I know I
myself won't get to it this month, so here it goes. Anyone out there with
enough time and python knowledge willing to help? Read on...

[1] http://lists.debian.org/debian-project/2010/09/msg00139.html

(Im not entirely set on the exact way this works, I am describing my
thoughts of it. Im happy to hear constructive ways making it better)

(Background: dinstall is basically a set of jobs run in a defined order,
sometimes in parallel. What the jobs do is unimportant for the case
here (do everything neccessary to update ftp.debian.org and all its
mirrors with the new uploads), but having them done in the right order
at the right time etc. is pretty important)


What we want is something that can about do everything and also figure
it out on its own. :)

I imagine a python script that in itself is small, with just the basic
logic to pull it all together. It should read in a set of values from
our database, like archive name, various directory settings, the basic
information it needs.
Additionally there is a directory with "code dumps", basically a set of
python code, each file having a defined structure. The script would read
in all of em and figure out what they do / when they expect to run/what
they provide. That is, there would be scripts with the following
attributes (and only some shown, we currently have around 60 different
functions called in a run):

(this table should look ok in a monospace font. At least does here :) )
+--------+-----------------+--------------+--------+---------+
|script  |provides         |depends       |priority|archive  |
+--------+-----------------+--------------+--------+---------+
|override|overrides        |              |10      |         |
+--------+-----------------+--------------+--------+---------+
|filelist|filelist         |              |11      |         |
+--------+-----------------+--------------+--------+---------+
|packages|packagesfiles    |overrides,    |10      |         |
|        |                 |filelist      |        |         |
+--------+-----------------+--------------+--------+---------+
|pdiff   |pdiff            |packages      |15      |ftpmaster|
+--------+-----------------+--------------+--------+---------+
|mirror  |mirror           |pdiff|packages|20      |         |
+--------+-----------------+--------------+--------+---------+


The new script would figure out that it has to run overrides first, then
filelist followed by packages. Then, if the archive is named ftpmaster,
it would run pdiff followed by mirror. (in this definition no archive
entry means all archives, something set is a list of archives to run on).
And so if the archive is not ftpmaster (say backports or security) it
would skip pdiff and go to mirror directly.
All scripts need to be run unless they are not relevant for the current
archive.

Priorities can be used to select which task to run first when executing
them in parallel and no dependency gets any order into it. Same priority
-> random, or alphabetic, or whatever order of execution)

Tasks that do not depend on each other should run in parallel, up to a
configurable limit of processes. There should be a way to have "sync
points" in this process, ie. at such a point all tasks, however many in
parallel, defined prior the "sync point" need to be finished before it
goes to the next waiting task. (Yeah, much like an init system).

An easy first step can be a tool that:
 1. reads in the scripts
 2. computes the optimal scheduling
 3. outputs a list of processing steps, each step containing a list of
 tasks that can be run in parallel.

Of course the system needs to keep the existing features of dinstall,
that is, a state machine that keeps track of the advancement of the
process. We need that, because we need to be able to break at any point
and cleanly restart there. (Think of a sudden reboot).

A second step can be the extension of the script to take this tasklist
and run the tasks, keeping track of progress, handling restarts in case
the process was interrupted.

If it then can also store the result of step 1, so it could reuse them
in later runs, provided the input values didn't change, it sounds perfect.
(Yes, the scheduler CAN be pretty costly, thats fine)


Its not all too hard to do, but it needs time. Time is scarce, so is
there anyone out there that has enough time to spare for this task? :)
(You don't actually need to write all the modules, the code for them
 exists already, the main script is what counts).

We do have a test system, though that one is limited to DD/DM
access. Setting up an own dak instance would be possible, but dak is
very ungrateful if it doesn't know you, so I think we find another way
there. The majority of this luckily doesn't need one (which is why I ask
the whole world to help). :)  Quite a bit of the software is not THAT
ftp-master specific, so a lot of its development can be supported by a
test suite that runs anywhere.

If you are up to it, I am reachable on irc.debian.org as usual (try
#debian-dak), or you can show up at debian-dak@lists.debian.org, both
fine.

-- 
bye, Joerg
Lisa, you’re a Buddhist, so you believe in reincarnation. Eventually,
Snowball will be reborn as a higher life form… like a snowman.

Attachment: pgpUKtmevxX0V.pgp
Description: PGP signature


Reply to: