[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: DDPO's future thoughts [Re: Developer.php performance (+patch)]



On Fri, Feb 20, 2004 at 11:02:55AM +0100, Igor Genibel wrote:
> * Jeroen van Wolffelaar <jeroen@wolffelaar.nl> [2004-02-18 19:30:05 +0100]:
> 
> [I was very busy the last weeks so, I apologize about my silence]

n/p.
 
> Hi Jeroen,
> 
> thanks a lot for this contribution I cannot entirely integrate in
> developer.php because neither on klecker.d.o nor on master.d.o php4-cgi
> is intalled.
> So the convert-to-db probably needs to be rewritten in other language.

Or alternatively, one could request php4-cgi to be installed on
klecker... shouldn't be a too much of a deal. A bit hackish though, but
one could even have that code in developer.php itself and execute only
when the .db is outdated, that means that the first request after update
takes 1 sec longer than usual, not really a big deal (performance is
always at least as good as it is currently :-), since currently bugs.txt
is read anyway).
Heck, one could even have a special mode in developer.php, called via
wget from the bugs update.

Otoh, it's very easy code, so it _also_ can be rewritten indeed (perl
seems to be the best choice imho).
 
> Moreover, the developer.php (and all its backend) really needs to be
> rewritten because it is really ugly and the performance are really slow.
> That's why I started some week ago on a complete rewrite in order to
> provide static html files (based on xml tranformation) in order to 
> increase the performance. 

I'm from the "don't touch what's not broken" school. I do agree the code
is ugly, but performance can be easily improved by having a sane
datastructure, i.e. something else than the textfile-parsing-gibberish
there is now.

With only the bugs.txt -> .db improvement, developer.php is already very
usable, performance wise.

The ddpo.py code I didn't dare to touch, as I'm no python hacker. I
personally believe I'm best at designing code (higher level) and
directions etc., more than I write code (though I usually write it
myself too, but not always).

In any case, I think statically generating .html pages is not the way to
go. With dynamically, but efficiently, generated pages, one is very
flexible, can have any selection of packages, without really performance
inpact (a bit html generating php code with some db lookups is quite
fast), without the need for a lengthy 'generate all pages one _might_ be
requesting' process, which is already done too often imho, while it
isn't needed.

Especially updates can all be done independently with seperate .db files
as source of information. Added bonus is that you get page design and
data retrieval seperated for free, so data can be reused anywhere by
anyone. I'm especially thinking about making the PTS info and the
developer.php info cooperate in data retrieval, rather than both doing
it their own way.

Of course, current 'extract' needs to be better designed. Since I've now
a copy, I could write a proposal on a better data structure.

> I think it's time for this piece of code to be available on alioth
> because I want more people to work on it (redesign, recode, ...)
> 
> So Jeroen, I would be glad to see you, if you are interested, involved
> in the project, and you will see that all the ddpo code is entirely
> available in the qa cvs tree.

Cool :), ok.
 
> For the moment, I will continue to maintain it as it is in this cvs tree
> and try to improve its performance, ... and start the project on alioth
> in order to provide a really better tools than it is now.

You can better work in qa's cvs tree, not? Reuser the ddpo dirs etc, and
ditch ddpo.py when it's unneeded, and simply rewrite parts of
developer.php whenever there is a better interface for retrieving data?

I don't think the current design of 'process to generate data', and
dynamically generated page on top of that is broken, it's only the
implementation that can use improvement. It's always better to try to
redo only things that are broken, starting from scratch would be a waste
of time imho, while it currently does work (though not very fast, and
especially not easily extendible).

My proposal (think first (1&2), act later (3&4)):
1) document what info is going into extract, and where that's all coming
from
2) think of a good way of storing that data, which be retrieved
efficiently
3) split extract in multiple scripts (I prefer no python personally)
that retrieve those data, and put it in an efficient form
4) modify developer.php to use the better data accessing, and change the
logic a bit, so that other selection criteria for packages can be
implemented.

IMHO, this can be done without much work, as it should be.

--Jeroen

-- 
Jeroen van Wolffelaar
Jeroen@wolffelaar.nl (also for Jabber & MSN; ICQ: 33944357)
http://Jeroen.A-Eskwadraat.nl

Attachment: pgpBkwMzjxhf_.pgp
Description: PGP signature


Reply to: