Re: Are popcon stats per package and arch possible?
On Sat, May 05, 2007 at 11:23:54AM -0700, Steve Langasek wrote:
> On Sat, May 05, 2007 at 06:23:36PM +0200, Petter Reinholdtsen wrote:
> > [Michael Hanke]
> > > To me it looks like stats for the major architectures up to (and
> > > including) powerpc are ok wrt privacy concerns. Do you agree?
> > I'm not sure if that would be the correct cutoff point, or if only
> > amd64 and i386 have enough submissions to ignore the privacy issue.
> Well, at that point what use is a per-arch stat anyway?
To illustrate that there are no arches any more ?
it brings out an interesting detail.
presumably the boxes of amd64 users are faster on average.
maybe users of faster hardware prefer different software.
but at that point, arch is a very rough proxy for 'fast', for which
there could be more interesting numbers available.
I would be surprised if there weren't some users who wouldn't happily
provide more detailed info about the host hardware, and similarly
I would imagine that different users would have different thresholds
for how that information might be used.
A really good trick would be to come up with a more general system
by which people could publish information about themselves in
confidence expressing constraints on how that information could be
used to create statistics that would then be made public. Although
I am not aware of it, I would be surprised if work had not already
been done in this area (and, no, I don't mean DRM).
You probably also want policy on the other side.
FWIW, I would think you could get interesting numbers out down at least
as far as the sparc group, if not further. Techniques might include
suppressing particularly sensitive packages (social-outcast-1.1,
suppressing small numbers that might identify particular individuals.
publishing finished comparisons, rather than rawer data.
For example, "MTA(s) installed".