[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#969553: urlcheck.py script tries to parse compressed GIMP image files



Hi Laura!

I've written the attached patch (untested).

I'm placing my changes under the following Licence:

https://github.com/shlomif/shlomif-computer-settings/blob/master/shlomif-settings/git/commit-messages/cc0-copyright-disclaimer.txt

Note that the program is still python 2.x which was EOLed. It should be updated
to python 3.x.

On Fri, 4 Sep 2020 21:41:40 +0200 Laura Arjona Reina <larjona@debian.org> wrote:

> Package: www.debian.org
> User: www.debian.org@packages.debian.org
> Usertag: scripts
> Severity: normal
> 
> Hi
> 
> the scripts "urlcheck" generate this log in the /logos folder:
> 
> Looking into http://www.debian.org/logos/openlogo.xcf.gz
>   Error reading page: http://www.debian.org/logos/openlogo.xcf.gz
> Looking into http://www.debian.org/logos/officiallogo.xcf.gz
>   Error reading page: http://www.debian.org/logos/officiallogo.xcf.gz
> Looking into http://www.debian.org/logos/officiallogo-nd.xcf.gz
>   Error reading page: http://www.debian.org/logos/officiallogo-nd.xcf.gz
> 
> I guess this means it tries to parse the xcf.gz files and probably we
> need to update the script to skip such files (compressed images).
> 
> Anybody familiarised with Python, who can help?
> 
> The code of the script is here:
> 
> https://salsa.debian.org/webmaster-team/cron/-/tree/master/urlcheck
> 
> (I guess the main script, urlcheck.py, is where maybe the fix should be
> made).
> 
> The script is called by 3 cron jobs:
> 
> 17  3 * * *     cd /srv/www.debian.org/cron/urlcheck && ./run.urlcheck
> 36 12 * * *     cd /srv/www.debian.org/cron/urlcheck &&
> ./make.bad_link.pages
> 5  13 * * *     cd /srv/www.debian.org/cron/urlcheck && ./cleanup.logs
> 
> and the daily logs are here:
> https://www-master.debian.org/build-logs/urlcheck/
> (check logos folder).
> 
> Kind regards



-- 

Shlomi Fish       https://www.shlomifish.org/
UNIX Fortune Cookies - https://www.shlomifish.org/humour/fortunes/

The cake was not a lie for Chuck Norris.
    — https://www.shlomifish.org/humour/bits/facts/Chuck-Norris/

Please reply to list if it's a mailing list post - https://shlom.in/reply .
diff --git a/urlcheck/run.urlcheck b/urlcheck/run.urlcheck
index b532eac..4f5749b 100755
--- a/urlcheck/run.urlcheck
+++ b/urlcheck/run.urlcheck
@@ -7,6 +7,7 @@ date=`date +%Y%m%d`
 	--ignore debian.org/fom --ignore /releases/ --ignore /international/ --ignore /security/ \
 	--ignore /devel/ --ignore /News/ --ignore /doc/ --ignore /distrib/ \
    --ignore /ports/ --ignore /intl/ \
+   --ignore '\.xcf\.(?:bz2|gz|xz)$' \
 	http://www.debian.org/ >& logs/web.$date &
 ./urlcheck.py --require www.debian.org/international http://www.debian.org/international/ \
 	>& logs/web.$date.intl &
diff --git a/urlcheck/urlcheck.py b/urlcheck/urlcheck.py
index a5c3909..e60aa78 100755
--- a/urlcheck/urlcheck.py
+++ b/urlcheck/urlcheck.py
@@ -229,6 +229,7 @@ def append_from(path, list):
 		print "Can't open " + path
 		sys.exit(1)
 
+ignore.append('\\.xcf\\.(?:bz2|gz|xz)$')
 options, args = getopt.getopt(sys.argv[1:], "", ["require=", "ignore=", "requirefrom=", "ignorefrom=", "non-compliant", "non-compliant-from="])
 for option in options:
 	if option[0] == '--require':

Reply to: