[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#884778: Validation complains about HTML files with zero size; deleted, but they reappear



Package: www.debian.org
User: www.debian.org@packages.debian.org
Usertag: scripts
Severity: normal

Hello all

(This is a long bug report, I suspect several different issues converge here.
Sorry for the long mail).

Since some weeks, from time to time we receive a "new" kind of "Validation
error", e.g. this one from 7 Dec 2017:

*** Errors validating
	/srv/www.debian.org/www/devel/wnpp/being_packaged_byactivity.en.htm
	l: ***
Line 1, character 1:  missing document type declaration; assuming HTML 4.01
	Transitional

This happens when our "validate" scripts tries to analize an html file that is
actually empty (zero size).

I don't know:

* why some times the build script (or the validation tool) produces these zero
size files (let's call this ISSUE_A). I couldn't reproduce the issue in local.

* If this was happening more often but the validate program in jessie was not
complaining, and the validate program in stretch (more modern yeah!) complains,
and that's why we notice the issue now.

* If somebody acted on the files reported on 7 Dec 2017 (I didn't). All the zero
size HTML files were under /devel/wnpp so I guess the files were automatically
rebuilt not long after the issue, and that time, the build was correct. I see
the files are shown well in the website and we didn't get more validation errors
about these files.

On 11 Dec 2017 I received again similar "Validation error" mails. In particular,
about these 6 files:

*** Errors validating
	/srv/www.debian.org/www/News/weekly/2013/19/index.it.html: ***
Line 1, character 1:  missing document type declaration; assuming HTML 4.01
	Transitional
*** Errors validating
	/srv/www.debian.org/www/News/weekly/2004/42/index.pt.html: ***
Line 1, character 1:  missing document type declaration; assuming HTML 4.01
	Transitional
*** Errors validating
	/srv/www.debian.org/www/News/weekly/2005/index.pt.html: ***
Line 1, character 1:  missing document type declaration; assuming HTML 4.01
	Transitional
*** Errors validating /srv/www.debian.org/www/News/2003/20030728.sv.html:
	***
Line 1, character 1:  missing document type declaration; assuming HTML 4.01
	Transitional
*** Errors validating /srv/www.debian.org/www/News/press/2001.sv.html: ***
Line 1, character 1:  missing document type declaration; assuming HTML 4.01
	Transitional
*** Errors validating
	/srv/www.debian.org/www/News/weekly/2000/22/index.sv.html: ***
Line 1, character 1:  missing document type declaration; assuming HTML 4.01
	Transitional

That time I could have a look at the build server, and saw that all those files
were zero size, and created on the evening of 9 Dec 2017 (during the website
rebuild for a point release).

I looked at the files with 0 size in the website build machine:

larjona@wolkenstein:/srv/www.debian.org/www$ sudo -u debwww find . -size 0

and found the files that generated the validation errors, and some other files
(.err, .log and other temp files).
I removed the HTML files with zero size, with

sudo -u debwww rm /srv/www.debian.org/News/path-to/file.XX.html

and then expected that the next website build would regenerate the files and
everybody happy.

We didn't receive more "Validation errors" so I thought the problem was solved
(well, ISSUE_A (why these zero size files are generated) stands, but we can
investigate further if/when the problem reappears).

Yesterday 18 Dec 2017 we received again "validation errors" related to zero size
HTML files:

*** Errors validating
	/srv/www.debian.org/www/News/weekly/2013/19/index.it.html: ***
Line 1, character 1:  missing document type declaration; assuming HTML 4.01
	Transitional
*** Errors validating /srv/www.debian.org/www/News/2003/20030728.sv.html:
	***
Line 1, character 1:  missing document type declaration; assuming HTML 4.01
	Transitional
*** Errors validating /srv/www.debian.org/www/News/press/2001.sv.html: ***
Line 1, character 1:  missing document type declaration; assuming HTML 4.01
	Transitional
*** Errors validating
	/srv/www.debian.org/www/News/weekly/2000/22/index.sv.html: ***
Line 1, character 1:  missing document type declaration; assuming HTML 4.01
	Transitional
*** Errors validating
	/srv/www.debian.org/www/News/weekly/2004/42/index.pt.html: ***
Line 1, character 1:  missing document type declaration; assuming HTML 4.01
	Transitional
*** Errors validating
	/srv/www.debian.org/www/News/weekly/2005/index.pt.html: ***
Line 1, character 1:  missing document type declaration; assuming HTML 4.01
	Transitional

Again the same files than last week! And I've checked that the date of those
files is again 9 Dec 2017 (!)

So I guess one of these things happened:
* My "rm" command is not working and the files are not actually deleted (maybe
I'm deleting them in the wrong folder/machine?). If this is the case, I'd like
to know how should I proceed to actually remove the files (let's call it
ISSUE_B) and why we didn't receive validation errors mails every day since 11
Dec 2017 (let's call it ISSUE_C).
* My "rm" command worked but some process again put the files in their folders
after my removal (and maybe that was run just yesterday, and hence the lack of
validation error mails until yesterday?)

I'm not sure what can I do to try to solve the issues.

A *workaround* that comes to mind is to make a dummy commit to the 6 files, so
they are rebuilt (truly rebuilt) again and get rid of the validation mails for
this time.

Another thing that we can do is to remove only one or two of the HTML files and
see what happens. But we only keep logs of the last two website builds so I'll
try to do it when I can be sure that I have time to see the logs of the
following build.

Meanwhile, I leave this bug open for the case it rings a bell to somebody.

Best regards
-- 
Laura Arjona Reina
https://wiki.debian.org/LauraArjona


Reply to: