[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Mining popocon data



Le Wed, Nov 21, 2007 at 03:13:58PM +0100, Michael Hanke a écrit :
> In case you're not aware of the discussion starting here:
> 
> http://lists.debian.org/debian-devel/2007/05/msg00105.html
> 
> Summary: The problem with public stats of special interest packages and
> architectures is that they _might_ disclose _who_ is using the packages.
> 
> I'm not sure if the above cited concerns also fully apply here, though.

Actually, it is very important concerns. I do not remember in which
high-profile journal, but this was also recently noted for anonymized
medical data: if numbers are not big enough, one can cross anonymized
data with external data and identify some of the persons. This is
especially a concern for Debian-Med as our number of users is low.

Therefore, we should better explore these issues and demonstrate the
safeness of the analysis we want to make before bothering the popcon
people. For instance, knowing how many people using t-coffee are using
clustalw seems harmless. Add dozens of such correlations, and I admit
that I have no solid background to guarantee that anonymisation still
stands.

Obviously, if you have some training in the field, further input from
you is most welcome :)

Have a nice day,

-- 
Charles Plessy
http://charles.plessy.org
Wakō, Saitama, Japan



Reply to: