[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Mining popocon data



Le Wed, Nov 21, 2007 at 07:18:54AM -0800, Rudi Cilibrasi a écrit :
 
> Any DD may upload one new package p.  This package, at first, may have
> only one user.  That one user may be easy to guess from a number of
> other factors; e.g. perhaps it's the maintainer of p that has p
> installed.  In any case, providing the two-place function F(i,j)
> allows us to fully reconstruct exactly which packages the one user of
> p has installed, by simply running through all other packages and
> sampling F along the row or column p.
 
> Personally, I am a bigtime privacy advocate and have even gone on
> youtube to promote privacy already.  But in my opinion, the
> statistical information that could be gained far outweighs the minor
> cost of imperfect privacy here in the case of Debian package
> statistical analysis.   I have been studying Debian for six years and
> I still feel like I have no idea about most packages.  I guess it must
> be that much more confusing for the majority of our users and I would
> love to make some nice automatic graphs of how different packages
> relate according to usage, bdeps, deps, recs, etc.

Hi Rudi, Hi all,

maybe what will help us is that we are not so much interested in
individual cornercases anyway. As a starter, we could focus on
Field::Biology packages only. Accidental de-anonymisation would mean
that one could guess who installed, say mummer, emboss, gnumed-client
and bioperl at the same time for a given version number and
architecture. I think that it is an acceptable risk, because the nature
of this information is not very valuable.

Have a nice day,

-- 
Charles Plessy
http://charles.plessy.org
Wakō, Saitama, Japan



Reply to: