[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: sf.net redirector reports 500 Internal Server Error

On Tue, 2009-05-12 at 20:33 -0500, Raphael Geissert wrote:
> Bart Martens wrote:
> > On Tue, May 12, 2009 at 02:47:58PM -0500, Raphael Geissert wrote:
> >> Daniel Leidert wrote:
> [...]
> >> > Using uscan on the command line works.
> >> 
> >> Only if it is from devscripts 2.10.31 or greater (lenny was released
> >> with .35), which is smarter when redirected (required by the only
> >> sourceforge mirror that keeps the redirector working.)
> > 
> > I understand that using one mirror directly is a temporary approach that
> > should be replaced by something going via the sourceforge mirror-selection
> > system.
> That's not possible unless you reach an agreement with SF.

It is possible without such agreement, although probably easier with.

> > This can be solved for debian/watch files using 
> > http://sf.net/<projectname>/<filenamebase>-(.*)\.tar\.gz and similar, by
> > enhancing the sf.net redirector with the following algorithm:
> > 
> Patches welcome
> > - Access http://sourceforge.net/projects/<projectname>
> >   Parse the html to find the value of the group_id.
> > - Access http://sourceforge.net/project/showfiles.php?group_id=...
> >   Parse the html to find the values of the package_id's.
> > - For each package_id:
> >   - Access
> >   http://sourceforge.net/project/showfiles.php?group_id=...&package_id=...
> >     Parse the html to verify whether it contains files matching
> >     the pattern <filenamebase>-(.*)\.tar\.gz.
> So you want merkel to download three html pages every time the redirector is
> called?

Yes, three or more.

> DEHS currently has 1564 watch files that use the redirector, UEHS (Ubuntu's
> DEHS) also got some (no way for me to tell how many), any maintainer, DD,
> automated system might be using it.
> Sticking with only the number of watch files in DEHS, and since the watch
> files are checked at least every four days it would mean the redirector
> would have to download at least 8211 pages every week, 183MBs (120KBs for
> the three pages).

No need to check all files every four days.  Results from anyone using
the redirector can be fed back to DEHS.  Also, checking should slow down
when multiple consecutive results produce identical results, so
frequency can drop from every four days to once a month or even slower.
The load caused by anyone using the redirector can be reduced by caching
results, for example consecutive queries within 12 hours can return the
same result without actually checking every time. 

>  Only to provide a feature that most people don't need,

It's about addressing the issue of "the only remaining sf mirror that
keeps the redirector currently working".

> not to mention that it would be extremely easy to break?

Why would it ?

> >   - If it does contain files matching, then find the filename with the
> >     highest version number.  This is a preliminary result.
> > - After processing all package_id's, select the preliminary result with
> > the
> >   highest version number.  This is the final result.
> > 
> > So far what to do to make the existing debian/watch files continue to work
> > without depending on the only sf.net mirror that keeps the redirector
> > currently working.
> > 
> > Later on, a nice-to-have would be support for specifying a package_id in
> > debian/watch, so that the searching for the newest upstream release can be
> > limited to only one sf.net package within the sf.net group, instead of all
> > packages in the group.
> > 
> If you really want that then please provide the necessary patches and stay
> tuned on the watch files failures so that you fix the redirector every time
> it breaks.

See above, why would the described approach break so easily ?

>  And if you are to do that then why you don't simply take over
> DEHS? oh, and write the watch files four spec and implement it.

I prefer to join the team and to enjoy fixing DEHS together as peers
instead of taking over DEHS.

> Look, I hate being sarcastic, but you are obviously talking and assuming
> without knowing the real situation.

I'm sure that I don't know the entire real situation.  But I'm confident
that I know enough to be defending the design I described.

> If it was any simple, as you are
> putting it, somebody or I would have done it long time ago.

The design is quite simple.  I don't know whether someone else has
thought of this before.

> I once wrote a script to let watch files obtain the version information from
> freshmeat and the kde-apps (and similar) sites which only required one web
> page fetch, and nobody ever replied in spite of sending a couple of pings
> on the ML and on IRC, poking people, and ... nobody ever replied.

Is this still a problem today ? I'm not sure why you mention this here.

> Feel free
> to do whatever you want, if you actually do anything.

I guess that everyone does more than everyone else knows. :)

> P.S. I read the mailing list, so please respect the CoC and stop sending me
> copies of your replies.

OK, I'll try to remember that.

>  And don't expect me to reply.


Bart Martens

Reply to: