Re: Mining popocon data
Hi,
On Wed, Nov 21, 2007 at 10:57:47PM +0900, Charles Plessy wrote:
> Le Wed, Nov 21, 2007 at 05:29:04AM -0800, Rudi Cilibrasi a écrit :
> > We would need:
> >
> > 1) The total number of machines, M, using popcon.
> > 2) A count of machines that have package i installed F(i) for all packages i
> > 3) A count of machines that have both package i and j installed F(i,j)
> > for all packages i, j
>
> >From reading popcon.debian.org, I have the impression that one has to be
> DD to access the full database. I am also very interested in mining
> popcon, especially because it may involve similar tools as what my
> research is needing those days. Shall we enquire wether recommendation
> plus GPG-signing of a non-disclosure agreement would allow us to access
> the data? But maybe before this, we should read what is promised to the
> participants of popcon. If it is written "DD only", whatever the
> goodwill of the popcon admins, we can not ask… (And were I popcon
> developper, this is the kind of thing I would promise to the
> participants.).
In case you're not aware of the discussion starting here:
http://lists.debian.org/debian-devel/2007/05/msg00105.html
Summary: The problem with public stats of special interest packages and
architectures is that they _might_ disclose _who_ is using the packages.
I'm not sure if the above cited concerns also fully apply here, though.
Cheers,
Michael
--
GPG key: 1024D/3144BE0F Michael Hanke
http://apsy.gse.uni-magdeburg.de/hanke
ICQ: 48230050
Reply to: