[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Guidance required for GSoC project PTS rewrite in Django

On Wed, Apr 24, 2013 at 10:28:07PM +0530, Pankaj Kumar Sharma wrote:
> In the present system the content is loaded explicitly via cron.  The
> confusion that bounds me is that what should be methodology that we should
> use in the upcoming Django project ? Should the data be loaded at the time
> when some one asks for that or it should be present in the databases ?

The short answer is: we'll have to experiment with that :)

Some of the information that the PTS exposes (e.g. those related to the
status of the archive) are "fairly static", meaning that they change at
most 4 times a day. Others are "very dynamic" (e.g. bug information) and
ideally should really be live, as it could be really confusing for a
user to see that, say, a package has 1 RC bug, click on the bugs link,
and discover that that's not true. It's not true *anymore*, but the
random user would have no way of understanding that and think it's a
bug. This kind of incoherences has been an endless source of (bogus) bug
reports along the PTS life.

A separate question is how to make all this efficient, in term of
caching. Obviously, the current solution with static HTML pages is very
fast and is also easy to mirror in case of need.  A purely dynamic
solution would be on the opposite end of the spectrum in terms of
performances. We will probably need to stay somehow in the middle, and
benchmark the scalability of the new solution (as mentioned in the
project description).

Ideally, we should cache heavily, either by using Django caching, or by
producing actual HTML pages via Django templates (as mentioned by Paul
in this thread). And add on top of it heavy cache invalidation
mechanisms for live information, like bugs.  Alternatively, we might
want to cache only the information that are seldomly updated and be
entirely dynamic on the live information.

Regarding where the data come from, my dream would be to develop a
Python abstraction layer over all the data that the PTS uses. And then
have various implementation ("backends") of it. One can for instance
access directly UDD, another can access a local cache updated by cron
(as in the current PTS deployment), another be entirely live, and yet
another use mixed solutions. That would allow to more easily experiment
with the different solutions.

Hope this explains that we don't have yet written-in-stone-answers to
your question, and that finding out, via experiments, the right
trade-offs will be part of the actual project.

Stefano Zacchiroli  . . . . . . .  zack@upsilon.cc . . . . o . . . o . o
Maître de conférences . . . . . http://upsilon.cc/zack . . . o . . . o o
Former Debian Project Leader  . . @zack on identi.ca . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

Attachment: signature.asc
Description: Digital signature

Reply to: