Control: -1 + patch On 21/07/14 14:58, Paul Wise wrote: > On Mon, 2014-07-21 at 15:39 +0200, Abou Al Montacir wrote: > >> Are we really consuming so much bandwidth for that feature? I assume >> this will happen each time a user or a daemon wants to check a >> particular package. I'm not convinced this is worth especially they ask >> for a cache of 1 hour, do we expect that per package we do a check more >> than twice per day (daily daemon + random user) > > I told them the average usage based on the stats from qa.d.o Apache logs > (up to 30K requests per day) and said that was a bit high and asked us > to implement a cache. > That doesn't surprise me in the least! GetDeb actually switched to using my test redirector and in 5 days I logged nearly 32000 hits at my server... each of which would have been passed to sf.net (this was quickly resolved though). >> What about compressing the files? This can reduce the size dramatically. >> Can you please check for the file you used as example? > > Seems pointless to store the raw RSS, best extract the filenames and > store them in a database instead. > Okay... It took a bit of thinking of how to work it, but I've come up with a working solution that caches the file list for each project requested. I am storing each projects' file list in a separate Berkeley DB so we can check the file modification time and only update when the file is older than the cache limit ($cache_time) in seconds (currently 3600 seconds). Currently it is configured to store these files in a subdirectory of cache ($cache_dir), which will need to be writeable by the web server. Otherwise I don't think there is anything else particularly special to report. I have updated my test server and it is now running the latest version of the script. Regards, Daniel
--- ../sf-redirect-old/sf.wml 2014-07-21 19:24:00.835216162 +0100
+++ sf.wml 2014-07-21 19:45:21.683113723 +0100
@@ -1,21 +1,12 @@
<?php
-
-$data_dir = '/srv/qa.debian.org/data/watch';
-
// need to strip leading slash, sf.net doesn't like double slashes
$project=ltrim($_SERVER['PATH_INFO'], '/');
+$cache_dir = './cache';
+$cache_time = 3600;
if (!$project) {
- header('Location: http://manpages.debian.net/cgi-bin/man.cgi?query=uscan');
- exit;
-}
-
-$fdb = $data_dir . '/sf-list.db';
-
-if (!file_exists($fdb)) {
- header('HTTP/1.0 500 Internal Server Error');
- die('The files database is not available. Please report this message to'.
- ' debian-qa@lists.debian.org');
+ header('Location: http://manpages.debian.net/cgi-bin/man.cgi?query=uscan');
+ exit;
}
// $project is not a file and doesn't have trailing slash
@@ -29,40 +20,60 @@
exit;
}
-$db = dba_open($fdb, 'r', 'db4');
+$db_file = "$cache_dir/$project.db";
-if (!dba_exists($project, $db)) {
- header('HTTP/1.0 404 File Not Found');
- die('There is no information about the '.$project.' project.');
+if (file_exists($db_file) and time() - filemtime($db_file) < $cache_time ) {
+ # Open the db_file for reading
+ $db = dba_open($db_file, 'r', 'db4');
+} else {
+ $xml_url = "https://sourceforge.net/projects/$project/rss";
+ # Update/create the db_file, then read it's contents
+ # Load the rss feed using simplexml
+ $xml = @simplexml_load_file($xml_url, 'SimpleXMLElement', LIBXML_NOCDATA);
+ if ($xml === false) {
+ echo "No project named $project could be found, check the project name and try again";
+ exit;
+ } else {
+ # Get an array of files from the XML
+ $files = $xml->channel[0]->item;
+ # Create a new db file
+ $db = dba_open($db_file . '-new', 'c', 'db4');
+ # Add the file list to the db
+ $i = 0;
+ foreach ($files as $item) {
+ dba_insert($i, basename($item->title),$db);
+ $i++;
+ }
+ dba_close($db);
+ rename($db_file . '-new', $db_file);
+ $db = dba_open($db_file, 'r', 'db4');
+ }
}
-
-?><html>
+?>
+<html>
<head>
<title>File listing for project <?php echo htmlspecialchars($project); ?></title>
</head>
<body>
<p>
<h1>File listing for project <?php echo htmlspecialchars($project); ?></h1>
-Visit <a href="http://sf.net/projects/<?php echo htmlspecialchars($project); ?>"><?php echo
-htmlspecialchars($project); ?>'s project page</a>.<br/><br/>
+Visit <a href="http://sf.net/projects/<?php echo htmlspecialchars($project); ?>"><?php
+echo htmlspecialchars($project); ?>'s project page</a>.<br><br>
<?php
-echo dba_fetch($project, $db);
+$key = dba_firstkey($db);
+while ($key !== False) {
+ $file = dba_fetch($key, $db);
+ $link = $_SERVER['SCRIPT_NAME'] . "/$project/$file";
+ echo "<a href='$link'>$file</a><br>\n";
+ $key = dba_nextkey($db);
+}
?>
</p>
-<p>
-Thanks to <a href="http://ftp.heanet.ie/">HEAnet's mirror service</a>
-for being the source of data for this service.
-</p>
+<p>Last database update: <?php echo date(DATE_RFC822, filemtime($db_file)); ?></p>
<p>
Get the source code: <a href="svn://anonscm.debian.org/svn/qa/trunk/wml/watch">checkout SVN repository</a> |
<a href="http://anonscm.debian.org/viewvc/qa/trunk/wml/watch/">browse SVN repository</a>
</p>
-<p> Last database update:
-<?php echo date(DATE_RFC822, filemtime($fdb)); ?>
-</p>
</body>
-</html><?php
-
-dba_close($db);
-
-?>
+</html>
+<?php dba_close($db); ?>
Attachment:
signature.asc
Description: OpenPGP digital signature