[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[Popcon-developers] Clustering popcon data for recommendaitons



On Tue, Feb 07, 2017 at 03:58:10PM -0800, Todd Troxell wrote:
> Hi there @popcon!
> 
> I am curious  about using the popcon data to provide package
> recommendations - for example, my personal use case: what are the packages
> that are most common with users of xmonad to discover nice software that I
> may not know exists.
> 
> Chatting briefly with apenwarr he mentioned that there is some prior work
> for instance to assign related packages to the same DVD. I thought I would
> try to get an update from the current maintainers as to the state of this.
> 
> I build recommenders for a living and I would have just made a prototype
> but the data is (understandably!) private.  I would need access at the
> report level but of course with scrubbed emails, headers, ip addresses, etc.

There have been several projects to do just that.
The latest thread start there:
<https://lists.alioth.debian.org/pipermail/popcon-developers/2016-May/002721.html>

My main concern is that any recommendation system requires access to the
anonymised reports that we do not publish because they can potentially
be linked to the submitter.

So any recommendation system should be carefully designed to avoid
leaking too much data about the anonymised reports.

For example, let's say the maintainer of foobar is a popcon submiter, and
foobar has a very low popcon score.

You can ask the recommender "what users of foobar have also installed ?"
and receive a chunk of the maintainer of foobar package selection.

If the foobar maintainer also maintains foobaz, you can restart the query
with foobaz and compute the intersection.

If you know someone is a Debian maintainer and a popcon submitter, it is
not very hard to look at their list of packages to carry out this attack.

This is only the most obvious way this could be abused.

Cheers,
-- 
Bill. <ballombe at debian.org>

Imagine a large red swirl here. 



Reply to: