[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#752384: HEAnet sourceforge mirror is outdated



Control: -1 + patch

On 21/07/14 14:58, Paul Wise wrote:
> On Mon, 2014-07-21 at 15:39 +0200, Abou Al Montacir wrote:
> 
>> Are we really consuming so much bandwidth for that feature? I assume
>> this will happen each time a user or a daemon wants to check a
>> particular package. I'm not convinced this is worth especially they ask
>> for a cache of 1 hour, do we expect that per package we do a check more
>> than twice per day (daily daemon + random user)
> 
> I told them the average usage based on the stats from qa.d.o Apache logs
> (up to 30K requests per day) and said that was a bit high and asked us
> to implement a cache.
> 

That doesn't surprise me in the least! GetDeb actually switched to using
my test redirector and in 5 days I logged nearly 32000 hits at my
server... each of which would have been passed to sf.net (this was
quickly resolved though).

>> What about compressing the files? This can reduce the size dramatically.
>> Can you please check for the file you used as example?
> 
> Seems pointless to store the raw RSS, best extract the filenames and
> store them in a database instead.
> 

Okay... It took a bit of thinking of how to work it, but I've come up
with a working solution that caches the file list for each project
requested.

I am storing each projects' file list in a separate Berkeley DB so we
can check the file modification time and only update when the file is
older than the cache limit ($cache_time) in seconds (currently 3600
seconds).

Currently it is configured to store these files in a subdirectory of
cache ($cache_dir), which will need to be writeable by the web server.

Otherwise I don't think there is anything else particularly special to
report.

I have updated my test server and it is now running the latest version
of the script.

Regards,

Daniel
--- ../sf-redirect-old/sf.wml	2014-07-21 19:24:00.835216162 +0100
+++ sf.wml	2014-07-21 19:45:21.683113723 +0100
@@ -1,21 +1,12 @@
 <?php
-
-$data_dir = '/srv/qa.debian.org/data/watch';
-
 // need to strip leading slash, sf.net doesn't like double slashes
 $project=ltrim($_SERVER['PATH_INFO'], '/');
+$cache_dir = './cache';
+$cache_time = 3600;
 
 if (!$project) {
-    header('Location: http://manpages.debian.net/cgi-bin/man.cgi?query=uscan');
-    exit;
-}
-
-$fdb = $data_dir . '/sf-list.db';
-
-if (!file_exists($fdb)) {
-    header('HTTP/1.0 500 Internal Server Error');
-    die('The files database is not available. Please report this message to'.
-	' debian-qa@lists.debian.org');
+	header('Location: http://manpages.debian.net/cgi-bin/man.cgi?query=uscan');
+	exit;
 }
 
 // $project is not a file and doesn't have trailing slash
@@ -29,40 +20,60 @@
     exit;
 }
 
-$db = dba_open($fdb, 'r', 'db4');
+$db_file = "$cache_dir/$project.db";
 
-if (!dba_exists($project, $db)) {
-    header('HTTP/1.0 404 File Not Found');
-    die('There is no information about the '.$project.' project.');
+if (file_exists($db_file) and time() - filemtime($db_file) < $cache_time ) {
+	# Open the db_file for reading
+	$db = dba_open($db_file, 'r', 'db4');
+} else {
+    $xml_url = "https://sourceforge.net/projects/$project/rss";;
+	# Update/create the db_file, then read it's contents
+	# Load the rss feed using simplexml
+	$xml = @simplexml_load_file($xml_url, 'SimpleXMLElement', LIBXML_NOCDATA);
+	if ($xml === false) {
+		echo "No project named $project could be found, check the project name and try again";
+		exit;
+	} else {
+		# Get an array of files from the XML		
+		$files = $xml->channel[0]->item;
+		# Create a new db file
+		$db = dba_open($db_file . '-new', 'c', 'db4');
+		# Add the file list to the db
+		$i = 0;
+		foreach ($files as $item) {
+			dba_insert($i, basename($item->title),$db);
+			$i++;
+		}
+		dba_close($db);
+		rename($db_file . '-new', $db_file);
+		$db = dba_open($db_file, 'r', 'db4');
+	}
 }
-
-?><html>
+?>
+<html>
 <head>
 <title>File listing for project <?php echo htmlspecialchars($project); ?></title>
 </head>
 <body>
 <p>
 <h1>File listing for project <?php echo htmlspecialchars($project); ?></h1>
-Visit <a href="http://sf.net/projects/<?php echo htmlspecialchars($project); ?>"><?php echo
-htmlspecialchars($project); ?>'s project page</a>.<br/><br/>
+Visit <a href="http://sf.net/projects/<?php echo htmlspecialchars($project); ?>"><?php 
+echo htmlspecialchars($project); ?>'s project page</a>.<br><br>
 <?php
-echo dba_fetch($project, $db);
+$key = dba_firstkey($db);
+while ($key !== False) {
+	$file = dba_fetch($key, $db);
+	$link = $_SERVER['SCRIPT_NAME'] . "/$project/$file";
+	echo "<a href='$link'>$file</a><br>\n";
+	$key = dba_nextkey($db);
+}
 ?>
 </p>
-<p>
-Thanks to <a href="http://ftp.heanet.ie/";>HEAnet's mirror service</a>
-for being the source of data for this service.
-</p>
+<p>Last database update: <?php echo date(DATE_RFC822, filemtime($db_file)); ?></p>
 <p>
 Get the source code: <a href="svn://anonscm.debian.org/svn/qa/trunk/wml/watch">checkout SVN repository</a> &#124;
 <a href="http://anonscm.debian.org/viewvc/qa/trunk/wml/watch/";>browse SVN repository</a>
 </p>
-<p> Last database update:
-<?php echo date(DATE_RFC822, filemtime($fdb)); ?>
-</p>
 </body>
-</html><?php
-
-dba_close($db);
-
-?>
+</html>
+<?php dba_close($db); ?>

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: