[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#893418: Corrupted package names in by_vote.gz



On Fri, Mar 23, 2018 at 02:10:07PM +0100, Bill Allombert wrote:

> > Probably. Is the format of that file documented somewhere?
> This is a list of key/value pair in RFC822 style.
> See /usr/share/doc/popularity-contest/examples/bin/README.examples
> for the format of the Package line.

I have a few questions:

How is the package name separated from the integer fields? It does not
look like a fixed-width field:

Package: abev-form-obhgepi-fpk-nav          0     0     0     2
Package: abev-form-obhgepi-fpk-nav-egyeb     0     0     0     2

If it is instead space-separated, currently I didn't see package names
that contained spaces, but is there a guarantee that the package name
won't contain spaces?

Alternatively, should the parsing instead be done by splitting on \s+
from the right with a maximum of 4 splits?

Some package names seem to be truncated, like this one:

Package: apache-openoffice-4.1.4-linux-x86-install-rpm-de     0     0     0     1

Is the character set guaranteed to be UTF8, or should I parse it as
binary, and drop all lines that do not decode as UTF8, or even all lines
that are not strictly 7-bit ascii, like this one?

Package: li37sp©y                           0     0     0     1


Enrico

-- 
GPG key: 4096R/634F4BD1E7AD5568 2009-05-08 Enrico Zini <enrico@enricozini.org>


Reply to: