Re: Bug#969553: urlcheck.py script tries to parse compressed GIMP image files
Hi Laura!
I've written the attached patch (untested).
I'm placing my changes under the following Licence:
https://github.com/shlomif/shlomif-computer-settings/blob/master/shlomif-settings/git/commit-messages/cc0-copyright-disclaimer.txt
Note that the program is still python 2.x which was EOLed. It should be updated
to python 3.x.
On Fri, 4 Sep 2020 21:41:40 +0200 Laura Arjona Reina <larjona@debian.org> wrote:
> Package: www.debian.org
> User: www.debian.org@packages.debian.org
> Usertag: scripts
> Severity: normal
>
> Hi
>
> the scripts "urlcheck" generate this log in the /logos folder:
>
> Looking into http://www.debian.org/logos/openlogo.xcf.gz
> Error reading page: http://www.debian.org/logos/openlogo.xcf.gz
> Looking into http://www.debian.org/logos/officiallogo.xcf.gz
> Error reading page: http://www.debian.org/logos/officiallogo.xcf.gz
> Looking into http://www.debian.org/logos/officiallogo-nd.xcf.gz
> Error reading page: http://www.debian.org/logos/officiallogo-nd.xcf.gz
>
> I guess this means it tries to parse the xcf.gz files and probably we
> need to update the script to skip such files (compressed images).
>
> Anybody familiarised with Python, who can help?
>
> The code of the script is here:
>
> https://salsa.debian.org/webmaster-team/cron/-/tree/master/urlcheck
>
> (I guess the main script, urlcheck.py, is where maybe the fix should be
> made).
>
> The script is called by 3 cron jobs:
>
> 17 3 * * * cd /srv/www.debian.org/cron/urlcheck && ./run.urlcheck
> 36 12 * * * cd /srv/www.debian.org/cron/urlcheck &&
> ./make.bad_link.pages
> 5 13 * * * cd /srv/www.debian.org/cron/urlcheck && ./cleanup.logs
>
> and the daily logs are here:
> https://www-master.debian.org/build-logs/urlcheck/
> (check logos folder).
>
> Kind regards
--
Shlomi Fish https://www.shlomifish.org/
UNIX Fortune Cookies - https://www.shlomifish.org/humour/fortunes/
The cake was not a lie for Chuck Norris.
— https://www.shlomifish.org/humour/bits/facts/Chuck-Norris/
Please reply to list if it's a mailing list post - https://shlom.in/reply .
diff --git a/urlcheck/run.urlcheck b/urlcheck/run.urlcheck
index b532eac..4f5749b 100755
--- a/urlcheck/run.urlcheck
+++ b/urlcheck/run.urlcheck
@@ -7,6 +7,7 @@ date=`date +%Y%m%d`
--ignore debian.org/fom --ignore /releases/ --ignore /international/ --ignore /security/ \
--ignore /devel/ --ignore /News/ --ignore /doc/ --ignore /distrib/ \
--ignore /ports/ --ignore /intl/ \
+ --ignore '\.xcf\.(?:bz2|gz|xz)$' \
http://www.debian.org/ >& logs/web.$date &
./urlcheck.py --require www.debian.org/international http://www.debian.org/international/ \
>& logs/web.$date.intl &
diff --git a/urlcheck/urlcheck.py b/urlcheck/urlcheck.py
index a5c3909..e60aa78 100755
--- a/urlcheck/urlcheck.py
+++ b/urlcheck/urlcheck.py
@@ -229,6 +229,7 @@ def append_from(path, list):
print "Can't open " + path
sys.exit(1)
+ignore.append('\\.xcf\\.(?:bz2|gz|xz)$')
options, args = getopt.getopt(sys.argv[1:], "", ["require=", "ignore=", "requirefrom=", "ignorefrom=", "non-compliant", "non-compliant-from="])
for option in options:
if option[0] == '--require':
Reply to: