[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: DEP 8: Gathering Django usage analytics



On Fri, Nov 18, 2016 at 08:54:08AM +1100, Brian May wrote:
> Lars Wirzenius <liw@liw.fi> writes:
> 
> > If I understand this correctly, Django wants to gather usage
> > statistics from installed Django instances, in a way that they say
> > respects user privacy (though I failed to understand how, given a
> > quick read). They claim this information gathering is necessary for
> > them to sustainably get funding for Django development.
> 
> ... there was a response to this email here:
> https://github.com/django/deps/pull/31#issuecomment-261181821
> 
> Probably better to followup on this pull request as opposed to here,
> where upstream will read it.
> -- 
> Brian May <bam@debian.org>
>
I've answered to the thread with my views, copy here for convenience:



Hi aaugustin,

    The first response
(https://lists.debian.org/debian-devel/2016/11/msg00253.html) misunderstands
the proposal widely (or trolls naively, but I'll assume good faith). SInce I'm
not subscribed to the mailing-lists I'll reply here instead. Even if I
subscribe now I won't get earlier mail so I won't be able to answer in the
thread.

You can still answer to the thread. Just add the Message-Id as In-Reply-To
Header to your mail.
(http://webapps.stackexchange.com/questions/23197/reply-to-mailman-archived-message)
You'll get probably a better discussion if you do..

    The scheme proposed here specifically excludes "installed Django
instances". It only targets development setups.

This might be a misconception by Lars*, but actually the fact if it is
prodution or development does not make a difference. Also devs have a right for
privacy.

    Metrics don't carry any information about which site is being developed. It
can't reveal the "a site uses any particular software". There are privacy
considerations -- especially around IP addresses, which might reveal which
organization the metrics emanate from -- but they're discussed above and the
response doesn't build upon that discussion.

IPs can be used to track back to people / organizations. That is not so hard.
Also the other data, like OS version, django version, python version is
sensible information as it reveals that some user has some software with a
specific version installed.

Let me sidestep here a bit about the Debian Free Software Guidelines [dsfg].
Those guildines must be kept by any software inlcuded to Debian (main).  They
come along to the defintion of open source but it is considered also important
by the project that e.g also privacy is respected. To aid people assessing if a
software is complying with the guidelines, a few tests have been developed
[dfsg-faq, §9]. While mostly hypothetical, they come to the core and show
certain deficits in e.g licenses. It is considered that only licenses passing
this tests are (DFSG) free.  The test which applies here is the "Dissident
Test", let me briefly quote here:

_The Dissident test.

Consider a dissident in a totalitarian state who wishes to share a modified bit
of software with fellow dissidents, but does not wish to reveal the identity of
the modifier, or directly reveal the modifications themselves, or even
possession of the program, to the government. Any requirement for sending
source modifications to anyone other than the recipient of the modified
binary---in fact any forced distribution at all, beyond giving source to those
who receive a copy of the binary---would put the dissident in danger. For
Debian to consider software free it must not require any such "excess"
distribution._

Of course, we're not talking about a licensing issue here, but the DFSG
presents quite well the spirit of the project and so can be transfered to the
topic at hand: It will not be acceptable for Debian to have a by-default-on
phone-home functionality.  Another data point here are the different lintian
errors connected to privacy-breach: lintian-tags, search for "privacy-".

Errors are the most severy category emitted by the lintian tool.

    The author goes on to say the proposal would be acceptable if: "it's opt-in
by sysadmins installing Django": I proposed to make it opt-in by requiring
explicit confirmation (which may or may not be reflected in the DEP); since
sysadmins typically don't run startproject, startapp or runserver, that
situation seems highly theoretical anyway "also each user using a site built on
top of Django whose use is going to be reported": apparently the author didn't
read why we aren't planning to do things like version checks from the admin (or
they're just attempting to set an impossibly high bar to kill the proposal, in
which case, they picked the wrong bar)

I'm pretty sure that you have honest intention and it is only for the best of
Django, but for Debian's spirit is not an important detail who runs the
commands or what are the reasons are behind. The fact of phoning home is and
(when done without explicit consent from the user) deemed inacceptable.

    To sum up, it appears that @jacobian correctly anticipated the criticism
that metrics may draw and designed a scheme that meets privacy requirements,
pending clarifications on the UX before sending metrics so it's considered
"opt-in".

I agree: An explicit "Opt-in" is very likely acceptable.  Personally I think,
when you're very clear in the context of the opt-in what you're about to do and
what you believe are the benefits for your users, many will do you the favour
of opting in. (Just take a look at popcon, they did it very well)

    Lars is Debian project member (aka Debian Developer).

Note: I am also Debian Developer, but I'm not affiliated with the Django
packaging and while I think my opinion would be shared largely by the Debian
project's people, I explicitly only talk about my views here..

[dfsg] https://www.debian.org/social_contract#guidelines 
[dfsg-faq] https://people.debian.org/~bap/dfsg-faq.html
 
--
tobi

Attachment: signature.asc
Description: PGP signature


Reply to: