Bits from /me: A humble draft policy on "deep learning v.s. freedom"

To: debian-devel@lists.debian.org, debian-science@lists.debian.org
Subject: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
From: Mo Zhou <lumin@debian.org>
Date: Tue, 21 May 2019 00:11:14 -0700
Message-id: <[🔎] f544829dcd6c0f92ea11cdb25543bdac@debian.org>

Hi people,

A year ago I raised a topic on -devel, pointing out the
"deep learning v.s. software freedom" issue. We drew no
conclusion at that time, and linux distros who care about
software freedom may still have doubt on some fundamental
problems, e.g. "is this piece of deep learning software
really free"?

People do lazy execution on this problem. Now that a
related package entered my packaging radar, and I think
I'd better write a draft and shed some light on a safety
area. Then here is the first humble attempt:

  https://salsa.debian.org/lumin/deeplearning-policy
  (issue tracker is enabled)

This draft is conservative and overkilling, and currently
only focus on software freedom. That's exactly where we
start, right?

Specifically, I defined 3 types of pre-trained machine
learning models / deep learning models:

  Free Model, ToxicCandy Model. Non-free Model

Developers who'd like to touch DL software should be
cautious to the "ToxicCandy" models. Details can be
found in my draft.

Apart from that, I pointed out in the draft that software
associated with any critical task should be considered
carefully as deep neural networks introduced a new kind
of vulnerability, that a network's response can be
disrupted or even controlled by some carefully designed
perturbations added to the network put.

Hence, I suggest that packaging an intelligent software
must be discussed on -devel if the piece of software is
associated with any kind of critical task, including but
not limited to

  * authentication (e.g. login via face verification or
    identification)
  * program execution (e.g. intelligent voice assistants:
    "Hey, Siri! sudo rm -rf / --no-preserve-root")
  * physical object manipulation (e.g. mechanical
    arms in non-educational occasion,
    cars i.e. auto pilot), etc.

See my draft for details.

The package that entered my packaging radar is nltk_data.
https://github.com/nltk/nltk_data
The 2 most widely used python-based computational
linguistics toolkit, NLTK and Spacy, require these
data (datasets + models) to enable most of their
functionalities.

Best,
Mo.

Reply to:

Follow-Ups:
- Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: Paul Wise <pabs@debian.org>
- Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: Andreas Tille <andreas@an3as.eu>
- Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: Ben Hutchings <ben@decadent.org.uk>

Prev by Date: Re: Official non-official Debian backporting versioning scheme
Next by Date: Re: Do we want to Require or Recommend DH
Previous by thread: Bug#929258: ITP: ruby-zeitwerk -- Efficient and thread-safe constant autoloader
Next by thread: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
Index(es):
- Date
- Thread