[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics



On Wed, Nov 17, 2021 at 08:05:38AM +0800, Paul Wise wrote:
> On Tue, 2021-11-16 at 13:38 +0100, Bill Allombert wrote:
> 
> > What is the idea exactly ?
> 
> Bálint's idea was to ship popcon data in a popcon-stats-data package in
> the Debian archive. I suggested to instead ship that in the apt
> metadata present in the Packages files.
> 
> > How often the popcon data are going to be refreshed ?
> 
> I would assume with the same frequency as the existing data on the
> popcon.d.o website is refreshed. Anything faster than that would just
> be refreshing unchanged data. Anything slower than that would be
> providing outdated data. Outdated data is fine though, so maybe weekly.
> 
> > Which exact set of data are going to be used ?
> 
> Initially I thought similar to the QA per-package popcon data:
> 
> https://qa.debian.org/popcon.php?package=iotop
> 
> Package: iotop
> Popcon: 30314 7962 21197 1143 12
> 
> If I massage the by_inst file into the same format as this, I calculate
> that the extra Popcon fields would add 3.7 MB to the Packages files and
> that data would change often, making the apt updating process slower.
> So probably the data should go into new files instead and there should
> be a config file snippet to enable downloading them, a tool to query
> and index them and a way for apt clients to get that data.
> 
> Since the Debian repository splits the metadata by suite and component,
> these new statistics should probably do the same. So the raw popcon
> submissions would need to be individually mapped to a suite based on
> the popcon version in the submission, and then each item in the
> submission attributed to that suite/component. For popcon versions that
> don't match a suite, if they match a known Debian version, attribute
> them to the next highest suite and discard submissions with popcon
> versions that were never in Debian, or maybe attribute them to the
> relevant vendor separately. popcon submissions that don't have Debian
> as the vendor probably should be discarded, or maybe attribute them to
> the relevant vendor separately.

So the idea is to have a Popcon file for each suite ?
So let say bookworm is released today. What bookworm/Popcon will contain ?
We release a new popularity-contest package. What sid/Popcon will contain ?
The package migrate to testing; What testing/Popcon will contain ?
As I understand, the metadata for stable are only updated with point
releases. Would that be the same for stable/Popcon ?

I still do not quite see how this would work...
We do not want to provide data generated from a very small subset of
reports for accuracy and privacy reasons.
The current all-popcon-result.gz/stable-popcon-result.gz split is
middle ground between competing constraints.

What not instead write a tool to download all-popcon-result.gz or
stable-popcon-result.gz when needed, and cache them ?
This can then be processed by a tool that makes suggestions.

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 


Reply to: