[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically



On Sun, Feb 19, 2012 at 06:52:16PM +0900, Charles Plessy wrote:
> 
> The Umegaya gatherer is not actively monitoring our repositories.  I do not
> know how much it would load the Alioth machines.  The way Umegaya works is that
> when it is queried, it tries to refresh its information if it is older than an
> arbitrary age (currently 60 s).

I'm afraid I do not fully understand this.  I somehow assumed that if
I'm editing a debian/upstream file and commit it to our Vcs after some
delay (say 1 day) this change would be reflected in Umegaya and (in the
worst case one day later) the UDD bibref gatherer would fetch the
changed status.  So assumed I have added some bibliographic information
on Friday this is reflected on Monday on our tasks pages.  Is this
something I could expect with the current setup or is there some manual
intervention needed?

Regarding the workload which would be needed on Alioth to verify any
change in upstream files this is probably pretty low.  I tried:

$ time ssh wagner ./my-test > upstream_sha1sum
real    0m13.043s
user    0m0.008s
sys     0m0.008s

while my-test looks like this:

#!/bin/sh
mkdir -p test
cd test

# fetch SVN and check for upstream files
svn co svn://svn.debian.org/debian-med/trunk/packages >/dev/null

# check out those Git repositories that are containing upstream files
# next line stolen from Charles
GIT_REPO_WITH_UPSTREAM=`for repo in /git/debian-med/*.git ; do (cd $repo ; git ls-tree master debian/ 2>/dev/null | grep 'debian/upstream$' > /dev/null && echo "$repo"); done | sed -e 's|/git/debian-med/||' -e 's|.git$||'`
for repo in $GIT_REPO_WITH_UPSTREAM ; do
    git clone git://git.debian.org/git/debian-med/${repo}.git 2>/dev/null > /dev/null
done

# Create SHA1 sum of all upstream files
for upstream in `find . -type f -name upstream | grep -v tags` ; do
    echo "`sha1sum $upstream | cut -d' ' -f1``ls -l $upstream | sed 's/^[-rwlx+]\+ [0-9]\+ [^ ]\+ [^ ]\+\([ 0-9]\+[0-9]\) .*/\1/'` $upstream"
done


In other words: It seems pretty cheap to get the sha1sums of all
debian/upstream files owned by the Debian Med team and thus get a signal
which data need to be updated if you keep a record of those sha1 sums at
some place.  I admit this simple check currently only works for the
Debian Med Vcs but for an estimation of the needed effort on Alioth this
should be sufficient.
 
> If the package is not yet in the Umegaya database, it attempts to discover its
> VCS URL using debcheckout.

So how exactly will a package be registered in the Umegaya database.
(BTW, I keep on cut-n-pasing even the short name - could we call the
database the same as the file and name it upstream database? ;-))

> If this fails, no information is loaded.  But of
> course there is a plan B.
> 
> Umegaya has a command-line interface, through which it is possible to indicate
> a VCS URL for a package with the --register option.

I wonder how this --register option is triggered - I guess a random
Debian Med team member can not do this.  IMHO it is important to
have some trigger which is based on the status of the data in the
upstream files rather than manual intervention.

> And this one will be
> remembered when refreshing.  This is enough for the tasks pages:
> 
>  - If the package is in our archives, we know its URL through debcheckout.
>  - If it is not, the task file provides a Vcs field.

ACK.

> Here is how to register all Svn-managed packages in the 'bio' task file, for
> instance.
> 
>   for url in $(grep 'Vcs-[SG]' bio | cut -f2 -d' ') ; do  umegaya-adm --register $url ; done
> 
> Unfortunately, I do not have time to finish today, but I hope I am convincing
> you that I am almost done.

Well, it is not about convincing me about something but rather what's
needed to implement some kind of unattended setup and even the step
above needs to be automated somehow because it needs to verify a change
inside the tasks files.  And even this depends from manual editing - not
everybody who enters a new package is injecting this into the tasks
files (rather the contrary - I need to be very carefully to watch such
new injects to move the information to the tasks file :-().

But the lack of an entry in the tasks files will not harm in so far
because if a package is lacking inside the tasks files we also need to
know the references as well.

I did not dived into PET but as far as I know this is more what I
consider an automatic update driven by the data inside the VCS and I
wonder, whether we should not rather somehow tweak the debian/upstream
files into the PET mechanism.  Did you considered this?
 
> The next steps:
> 
>  - Install the umegaya debian package on debian-med.debian.net.
>  - Give proper permissions to the members of the blends group.
>  - Register the packages in our tasks.
>  - Create the bibref tables.
>  - Point upstream-metadata.debian.net to debian-med.debian.net.

While I have no problems to move the upstream database to a different
host I do not see the connection in how far this will change the
issues I mentioned above.  For me it is not important on what host
the code is running but it is very important that any new or changed
debian/upstream file will be regarded after say 24h.

> Post-scriptum:
>  
> > IMHO a
> > 
> >     find <SVN-checkout> -type f -name upstream
> > 
> > as well as
> > 
> >     find /git/debian-med -type f -name upstream
> 
> for repo in /git/debian-med/*.git ; do (cd $repo ; git ls-tree master debian/ | grep 'debian/upstream$' > /dev/null && echo "$repo"); done | sed -e 's|/git/debian-med/||' -e 's|.git$||'

I used this snipped in my script above.
 
> For debian-science, the search revealed only rasmol, where the file is still
> called upstream-metadata.yaml

I can confirm that Debian Science is also not very actively using
debian/upstream files and also DebiChem people are hesitating because it
is just not clear what might be the profit from using this.  IMHO it
would be a clear profit that having the bibliographic information
displayed on the tasks pages right (modulo some delay) on the tasks
pages is a visible profit.  It might also be a profit to see that the
information is rendered straight into a BibTeX file or whatever.

Currently the flow of data is not transparent enough.

Moreover I was looking into the file biblio-for-UDD.sh which seems to be
responsible for preparing the input file for UDD import.  BTW, the UDD
importer failed today since the URL gave 404 - so I fixed the UDD
importer to simply do nothing in these cases instead of deleting the
table.

So when I looked into biblio-for-UDD.sh I realised that this is just a
call to some web service, mangling data into a different format and just
throwing out the reformated data.  I admit I personally consider the
usage of these web services on top of the Berkeley DB as somehow
confusing.

For example I checked:

   http://upstream-metadata.debian.net/table/DOI
and
   http://upstream-metadata.debian.net/table/Reference-DOI

which seems to reveal that some upstream packages remained on DOI
instead of Reference-DOI.

When thinking twice about it:  What is the sense of having this Berkeley
DB at all if we have UDD?  Why not importing the content of the upstream
files straight into UDD.  For me this somehow looks like a detour but as
I said I might be a bit narrow mindet on the usage on the tasks pages.

Kind regards and thanks for pushing debian/upstream anyway

    Andreas.

-- 
http://fam-tille.de


Reply to: