[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[Popcon-developers] Bug#730620: popularity-contest: Please improve machine readability of the raw data



Package: popularity-contest
Severity: normal

Hi,

I'm trying to parse the raw statistics available on
http://popcon.debian.org/ and ran into some problems.

Firstly, while it is fine that the fields are separated by multiple
spaces, it should then not be the case that field values contain spaces
themselves. Unfortunately this is the case for example for the package
name "Not in Sid". This is a similar request as in bug report #574743
which asks for sanitizing the package names before putting it into the
statistics.

Second of all, going together with package name sanitization (which, as
above example shows can make the data unparsable) some obvious bogus
entries can be entirely removed like the "Not in Sid" example from
above. There exists no such package. If you want to include the
information then better do it in a commented line as you do for the
header of the file for which you use # as a comment character.

Thirdly, at the end of the file there is one large line only consisting
of minus characters. Can this line also not be commented with a #?

The same goes for the very last line which presents a total. Firstly it
is not necessary to put a "rank" on this line (it has the same rank as
the last entry) but it is also not necessary to have this line at all
because any machine parsing the rest of the file can easily generate it.
If you want this line for human consumption, you can just simply prefix
it with a # to make it a comment.

Would you welcome a patch fixing these issues?

cheers, josch



Reply to: