Re: Public UDD mirror

To: debian-qa@lists.debian.org
Subject: Re: Public UDD mirror
From: Andreas Tille <andreas@an3as.eu>
Date: Fri, 12 Jul 2013 11:01:44 +0200
Message-id: <[🔎] 20130712090144.GC26210@an3as.eu>
In-reply-to: <[🔎] alpine.DEB.2.02.1307111727060.17291@rose.makesad.us>
References: <[🔎] alpine.DEB.2.02.1307072210020.29691@rose.makesad.us> <[🔎] alpine.DEB.2.02.1307111727060.17291@rose.makesad.us>

Hi Asheesh,

On Thu, Jul 11, 2013 at 05:35:12PM -0400, Asheesh Laroia wrote:
> On Sun, 7 Jul 2013, Asheesh Laroia wrote:
> 
> >Hey all,
> >
> >I started running a public UDD mirror here:
> >http://public-udd-mirror.xvm.mit.edu/
> >
> >I'll move it to be on .debian.net as the canonical URL in the near
> >future.
> >
> >Other to-do items tracked here for now:
> >https://github.com/paulproteus/public-udd-mirror/issues
> >
> >Happy to take questions or so on here as well. Yay,
> 
> As a collection of updates:
> 
> * It now auto-updates (hourly) from udd.debian.org/udd.sql.gz
> (checking timestamps before downloading)

Hmmmm, I wonder whether it would not be more sensible to rely onto
postgresql mirroring feature.  I do not have any experience with
postgresql mirroring (but for sure we have inside Debian a lot of
experience) but IMHO your approach has two drawbacks:

  1. The mirror is unavailable in the time of the update.  Given
     that the database import takes some minutes it is a couple
     of minutes per hour

  2. IMHO the solution is not as performant as a postgresql mirroring
     process (because I simply guess that postgresql is optimized to
     synchronise just changes and not the whole bunch of data)

> * You can see that script here https://github.com/paulproteus/public-udd-mirror/blob/master/scripts/update_udd.sh
> 
> * The page has some Javascript-based "self-monitoring" where the
> stamp file's Last-Modified header is displayed. Right now, it's Jan
> 1 1970 because the first auto-import is in progress, but presumably
> within 1 hour it'll be today's date.
> 
> * As a reminder, the full code (including maintenance scripts) are
> here: http://github.com/paulproteus/public-udd-mirror
> 
> Question:
> 
> * Is there a way to import a UDD dump that does not require erasing
> the whole current DB?

As long as you are relying onto a dump I do not think so.

> * Is this useful to you? If so, please make me feel good by saying so. (-;

IMHO an UDD mirror is pretty useful and I really welcome your attempt. 

I'm personally running a "not so 1:1 mirror" on blends.debian.net by
simply running the same code as UDD to update the tables.  I did some
simplifications.  For instance I stopped trying to import the bugs
tables because it involved rsync-ing the whole BTS data (at the time
when I stopped this it was 60GB which has eaten all the disk space of
the rented server.  For this purpose I use

   scripts/clone_udd_bugs_{fetch,inject}.sh

I have the feeling that this does not work fully relieable - I just did
not found enough time to track this down.  It works "reliable enough"
for my purpose.  The intended purpose is to have some playground to
develop Blends tools and create new importers for UDD which are needed
in this scope.  I usually run new stuff on this blends.d.n playground
before I submit it to official UDD.  For this purpose it seems to be
sufficient to have some lag in the updates (I run the importers only
once per day) and potentialy some differences in bugs (and probably
other tables).  Having an exact mirror would be nicer, though.

However, the method to create the UDD from a dump is not acceptable for
my purpose, because as I said I'm testing new features which I would
always need to re-apply after the import of the dump which is not really
helpful.

Kind regards and thanks for your support of UDD

      Andreas.

-- 
http://fam-tille.de

Reply to:

References:
- Public UDD mirror
  - From: Asheesh Laroia <asheesh@asheesh.org>
- Re: Public UDD mirror
  - From: Asheesh Laroia <asheesh@asheesh.org>

Prev by Date: New format for full Turtle RDF dump of the PTS
Next by Date: Re: archivesync push to quantz broken since July 5th
Previous by thread: Re: Public UDD mirror
Next by thread: Bug#715377: cloud-scripts: a ruby exception blocks the whole process
Index(es):
- Date
- Thread