Re: Mining popocon data

To: debian-med@lists.debian.org
Subject: Re: Mining popocon data
From: Charles Plessy <charles-debian-nospam@plessy.org>
Date: Thu, 22 Nov 2007 10:38:00 +0900
Message-id: <20071122013800.GA5244@kunpuu.plessy.org>
Reply-to: charles-debian-nospam@plessy.org
In-reply-to: <bd3cb4550711210718v40256066o5eb3af384831d2a8@mail.gmail.com>
References: <20071121044137.GH8808@kunpuu.plessy.org> <Pine.LNX.4.64.0711210701330.4388@wr-linux02> <bd3cb4550711210529g752a3470v9fd443c2306db8b2@mail.gmail.com> <20071121135747.GA24517@kunpuu.plessy.org> <20071121141358.GA12434@gloin> <20071121143335.GB24517@kunpuu.plessy.org> <bd3cb4550711210718v40256066o5eb3af384831d2a8@mail.gmail.com>

Le Wed, Nov 21, 2007 at 07:18:54AM -0800, Rudi Cilibrasi a écrit :
 
> Any DD may upload one new package p.  This package, at first, may have
> only one user.  That one user may be easy to guess from a number of
> other factors; e.g. perhaps it's the maintainer of p that has p
> installed.  In any case, providing the two-place function F(i,j)
> allows us to fully reconstruct exactly which packages the one user of
> p has installed, by simply running through all other packages and
> sampling F along the row or column p.
 
> Personally, I am a bigtime privacy advocate and have even gone on
> youtube to promote privacy already.  But in my opinion, the
> statistical information that could be gained far outweighs the minor
> cost of imperfect privacy here in the case of Debian package
> statistical analysis.   I have been studying Debian for six years and
> I still feel like I have no idea about most packages.  I guess it must
> be that much more confusing for the majority of our users and I would
> love to make some nice automatic graphs of how different packages
> relate according to usage, bdeps, deps, recs, etc.

Hi Rudi, Hi all,

maybe what will help us is that we are not so much interested in
individual cornercases anyway. As a starter, we could focus on
Field::Biology packages only. Accidental de-anonymisation would mean
that one could guess who installed, say mummer, emboss, gnumed-client
and bioperl at the same time for a given version number and
architecture. I think that it is an acceptable risk, because the nature
of this information is not very valuable.

Have a nice day,

-- 
Charles Plessy
http://charles.plessy.org
Wakō, Saitama, Japan

Reply to:

References:
- Anything wrong with med-bio?
  - From: Charles Plessy <charles-debian-nospam@plessy.org>
- Re: Anything wrong with med-bio?
  - From: Andreas Tille <tillea@rki.de>
- Re: Anything wrong with med-bio?
  - From: "Rudi Cilibrasi" <cilibrar@gmail.com>
- Mining popocon data
  - From: Charles Plessy <charles-debian-nospam@plessy.org>
- Re: Mining popocon data
  - From: Michael Hanke <michael.hanke@gmail.com>
- Re: Mining popocon data
  - From: Charles Plessy <charles-debian-nospam@plessy.org>
- Re: Mining popocon data
  - From: "Rudi Cilibrasi" <cilibrar@gmail.com>

Prev by Date: Re: Mining popocon data
Next by Date: Re: Mining popocon data
Previous by thread: Re: Mining popocon data
Next by thread: Re: Mining popocon data
Index(es):
- Date
- Thread