[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[Popcon-developers] Bug#255000: popularity-contest: popcon stats should be broken down numbers by popcon versions



Package: popularity-contest
Version: 1.43
Followup-For: Bug #255000

I have reviewed wishlist bugs on popcon package BTS.  I think this one
needs to be reevaluated and should be seriously considered to be
answered for the real needs of the user. (I know the fix is more on the
web page side.)

* Action requested:
Publish popcon-results sub data for each significant popcon agent
versions used.  (1.28, 1.41, 1.42, 1.43)

* Problem:
The problem is the current raw stats published on the web as
  http://popcon.debian.org/all-popcon-results.txt.gz
are mixture of many distribution and may give wrong impression for the
popularity of package.  For example, ghostscript package should be
identified as the most popular package by the time just before lenny
release for lenny.  Currently, etch data on gs-eps outnumbers
ghostscript and hides its popularity.  So the discussion for next
release 1st CD etc has to rely solely on manual analysis of the
situation.

* Why now:
When this bug was filed on 1.20, popcon version information was not
available for statistics as much.  But the newer popcon agent packages
after the 1.20 have been reporting their version in the submitted data.
As long as the popcon agent packages are updated after every stable
releases, the release specific stats can be gathered and published.

* Current situation: (What can be done now)
According to the current data of all-popcon-results.txt.gz (slightly
edited/reordered):

Release: unknown                          324
Release: 1.18.woody.19                      2
Release: 1.20                               4
Release: 1.22                               6
Release: 1.23                               2
Release: 1.24                               1
Release: 1.25                              14
Release: 1.26                              17
Release: 1.27                              25
Release: 1.28                            1107 <== sarge(oldstable)
Release: 1.29                               7
Release: 1.30                               8
Release: 1.30bpo1                           1
Release: 1.31                             100
Release: 1.31ubuntu2                        1
Release: 1.32-0.32bpo1                      1
Release: 1.32.0bpo1                         6
Release: 1.32                             126
Release: 1.33                             333
Release: 1.34                             127
Release: 1.35                               1
Release: 1.36                              78
Release: 1.38                             131
Release: 1.39                             640
Release: 1.40~bpo1                         13
Release: 1.40                             366
Release: 1.41                           50125 <== etch(stable)
Release: 1.42~bpo40+1                       3
Release: 1.42                           17790 <== lenny(testing)
Release: 1.43                             759 <== sid(unstable)
Release: 1.43+pb1                           1
===================================================================
Release up to sarge:(->1.28)             1502
Release up to etch: (->1.41)            52064 (about 5% more than just 1.41)
Release up to lenny:(->1.42)            17793
Release up to sid:  (->1.43)              759
===================================================================

Unlike per machine architecture stats requested as Bug #395926 which
will bloat publication CPU time and database size for small gains with
least statistical effects due to the lack of reports, the popcon agent
"Release" based stats are not much of bloat but provide real gains.
Actually, since each sub-data will be smaller not just by size but also
in package entry numbers due to removed packages, it should not be too
much extra CPU time.

For now, we should publish at least "Raw popularity-contest results" for
each well identified releases.  (As I compared range data vs. specific
release in the above, I see not much reason to publish sub-data for
non-official popcon agent versions.)  Namely, submitted data with 1.28 -
oldstable and popcon version without suffix after 1.41 - stable needs to
be analyzed..  This should be only 4 now.  If that is too much, just
stable and testing only stats should be great gain.

As long as we bump package version after each release, we should get
fairly good idea about popularity of packages.

For example, my attempts to use popcon value in the user documents at
  http://wiki.debian.org/DebianReference
which produces package lists with popcon vote % values such as one at
  http://people.debian.org/~osamu/pub/getwiki/html/ch12.en.html
should get more realistic and current slice of reality by using lenny
specific popcon data.  This kind of raw stats use by the other activity
can address requests such as  Bug #73603 which sought package install
guidance by the popcon.

Regards,

Osamu

PS: I think there is no practical security or privacy issue since we
have more than several hundred submissions for each sub data.

-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.22-3-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages popularity-contest depends on:
ii  debconf [debconf-2.0]         1.5.18     Debian configuration management sy
ii  dpkg                          1.14.15    package maintenance system for Deb

Versions of packages popularity-contest recommends:
ii  cron                          3.0pl1-103 management of regular background p
ii  exim4                         4.68-2     meta-package to ease Exim MTA (v4)
ii  exim4-daemon-light [mail-tran 4.68-2     lightweight Exim MTA (v4) daemon
pn  mime-construct                <none>     (no description available)

-- debconf information:
  popularity-contest/submiturls:
* popularity-contest/participate: true






Reply to: