Re: Bug#999677: RFP: popcon-stats-data -- Debian's Popularity Contest statistics
On Wed, Nov 17, 2021 at 08:05:38AM +0800, Paul Wise wrote:
> On Tue, 2021-11-16 at 13:38 +0100, Bill Allombert wrote:
>
> > What is the idea exactly ?
>
> Bálint's idea was to ship popcon data in a popcon-stats-data package in
> the Debian archive. I suggested to instead ship that in the apt
> metadata present in the Packages files.
>
> > How often the popcon data are going to be refreshed ?
>
> I would assume with the same frequency as the existing data on the
> popcon.d.o website is refreshed. Anything faster than that would just
> be refreshing unchanged data. Anything slower than that would be
> providing outdated data. Outdated data is fine though, so maybe weekly.
>
> > Which exact set of data are going to be used ?
>
> Initially I thought similar to the QA per-package popcon data:
>
> https://qa.debian.org/popcon.php?package=iotop
>
> Package: iotop
> Popcon: 30314 7962 21197 1143 12
>
> If I massage the by_inst file into the same format as this, I calculate
> that the extra Popcon fields would add 3.7 MB to the Packages files and
> that data would change often, making the apt updating process slower.
> So probably the data should go into new files instead and there should
> be a config file snippet to enable downloading them, a tool to query
> and index them and a way for apt clients to get that data.
>
> Since the Debian repository splits the metadata by suite and component,
> these new statistics should probably do the same. So the raw popcon
> submissions would need to be individually mapped to a suite based on
> the popcon version in the submission, and then each item in the
> submission attributed to that suite/component. For popcon versions that
> don't match a suite, if they match a known Debian version, attribute
> them to the next highest suite and discard submissions with popcon
> versions that were never in Debian, or maybe attribute them to the
> relevant vendor separately. popcon submissions that don't have Debian
> as the vendor probably should be discarded, or maybe attribute them to
> the relevant vendor separately.
So the idea is to have a Popcon file for each suite ?
So let say bookworm is released today. What bookworm/Popcon will contain ?
We release a new popularity-contest package. What sid/Popcon will contain ?
The package migrate to testing; What testing/Popcon will contain ?
As I understand, the metadata for stable are only updated with point
releases. Would that be the same for stable/Popcon ?
I still do not quite see how this would work...
We do not want to provide data generated from a very small subset of
reports for accuracy and privacy reasons.
The current all-popcon-result.gz/stable-popcon-result.gz split is
middle ground between competing constraints.
What not instead write a tool to download all-popcon-result.gz or
stable-popcon-result.gz when needed, and cache them ?
This can then be processed by a tool that makes suggestions.
Cheers,
--
Bill. <ballombe@debian.org>
Imagine a large red swirl here.
Reply to: