[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically



Le Sun, Feb 19, 2012 at 03:13:19PM +0100, Andreas Tille a écrit :
> 
> I somehow assumed that if
> I'm editing a debian/upstream file and commit it to our Vcs after some
> delay (say 1 day) this change would be reflected in Umegaya and (in the
> worst case one day later) the UDD bibref gatherer would fetch the
> changed status.

Hi Andreas,

let's imagine that every source package in Debian has a debian/upstream file.
To refresh the information daily, it would take more than 18,000 requests on
Alioth.

On my computer, it takes 2-3 seconds to interrogate the Subversion repository
on Alioth, and 1-2 seconds for Git.

$ time svn cat svn://svn.debian.org/svn/debian-med/trunk/packages/primer3/trunk/debian/upstream > /dev/null 

real	0m2.727s
user	0m0.068s
sys	0m0.020s

$ time curl -L -s $(umegaya-guess-url git://git.debian.org/git/debian-med/emboss.git | cut -f2) > /dev/null 

real	0m1.218s
user	0m0.012s
sys	0m0.012s

It would take hours to check every package daily, and I worry for the
load on Alioth.

This is why I designed a push model.  After updating debian/upstream for the
package 'foo', visit http://upstream-metadata.debian.net/foo/YAML-URL, and
Umegaya will refresh its information.  (This will work after I transfer the
service to debian-med.debian.net; I really hope to do it this evening).

Nevertheless, as long as only Debian Med is using Umegaya, we can forcibly
refresh the information daily.  A better way would be to have Subversion
and Git commit hooks that do the job.  I will work on this after the transfer
of upstream-metadata.d.n.


> So how exactly will a package be registered in the Umegaya database.

Currently one needs to log in on upstream-metadata.d.n, and run umegaya-adm
--register.  Alternatively, a cron job can use a similar script as you posted,
monitor new additions, and run umegaya-adm --register.  Later, I would like to
have a possibility to do this over the network; that what I meant by "HTTP
interface"; I should have written "URL API".  I want the CGI script to be able
to recieve new URLs to track.  To prevent kiddies to trick the system and make
us upload illegal stuff in the UDD, the system would for instance decline to
track any URL that is not in a "debian.org" domain.  Another alternative is to
let Umegaya try to search for unknown packages in svn.debian.org and
git.debian.org.


> (BTW, I keep on cut-n-pasing even the short name - could we call the
> database the same as the file and name it upstream database? ;-))

Isn't "upstream database" too generic ?  But within the scope of this
thread it is not a problem.


> I did not dived into PET but as far as I know this is more what I
> consider an automatic update driven by the data inside the VCS and I
> wonder, whether we should not rather somehow tweak the debian/upstream
> files into the PET mechanism.  Did you considered this?

The PET could also be a good starting point for monitoring the VCS and pinging
Umegaya.


> When thinking twice about it:  What is the sense of having this Berkeley
> DB at all if we have UDD?  Why not importing the content of the upstream
> files straight into UDD.  For me this somehow looks like a detour but as
> I said I might be a bit narrow mindet on the usage on the tasks pages.

If I understand well the UDD, it is updated by reloading whole tables.  Umegaya
is the table producer.  There could be other ways to do it, but since I am
aiming at a system that can cope with tens of thousands of packages, I
think that it rules out alternatives such as checking out all Alioth repositories
everyday.


I am sorry that I kept http://upstream-metadata.debian.net in a miserable state
this year.  I have done a lot of ground work that week-end, and the transfer of
to debian-med.debian.net, hopefully today, will be a fresh restart.

Cheers,

-- 
Charles


Reply to: