Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

To: Sam Hartman <hartmans@debian.org>
Cc: Holger Levsen <holger@layer-acht.org>, debian-devel@lists.debian.org
Subject: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
From: Mo Zhou <lumin@debian.org>
Date: Thu, 23 May 2019 22:53:15 -0700
Message-id: <[🔎] 3f7469eac3e50b8b5a3e54036ca72450@debian.org>
In-reply-to: <[🔎] tsly32ylkwg.fsf@suchdamage.org>
References: <[🔎] f544829dcd6c0f92ea11cdb25543bdac@debian.org> <[🔎] 20190521090709.t4o3hsx4p665ws6w@an3as.eu> <[🔎] 7ba5a9c7-a58e-e173-a99b-28f1dfc3deae@cohens.org.il> <[🔎] 319bcc5280dd76a6911598333a85cb8c@debian.org> <[🔎] 20190522084928.xvktficrwt3sqzb7@layer-acht.org> <[🔎] fdc4a5b5d096076b331346054257106d@debian.org> <[🔎] tsly32ylkwg.fsf@suchdamage.org>

Hi,

On 2019-05-22 12:43, Sam Hartman wrote:
> So, I think it's problematic to apply old assumptions to new areas.  The
> reproducible builds world has gotten a lot further with bit-for-bit
> identical builds than I ever imagined they would.

I overhauled the reproducibility section. And lowered the
reproducibility
standard from "Bit-by-Bit" to "Numerically", which is the most practical
choice for now. Anyway we can raise the bar in the future if things got
better in terms of reproducibility.

> However, what's actually needed in the deep learning context is weaker
> than bit-for-bit identical.  What we need is a way to validate that two
> models are identical for some equality predicate that meets our security
> and safety (and freedom) concerns.  Parallel computation in the
> training, the sort of floating point issues you point to, and a lot of
> other things may make bit-for-bit identical models hard to come by.

Indeed: I name this as "Numerically Reproducible":
https://salsa.debian.org/lumin/deeplearning-policy#neural-network-reproducibility

> Obviously we need to validate the correctness of whatever comparison
> function we use.  The checksums match is relatively easy to validate.
> Something that for example understood floating point numbers would have
> a greater potential for bugs than an implementation of say sha256.
>
> So, yeah, bit-for-bit identical is great if we can get it.  But
> validating these models is important enough that if we need to use a
> different equality predicate it's still worth doing.

For now, we just need to compare the digits and the curves: train twice
without any modification, and see if the curves and digits are the same.
Further measures, I think, depends on how this field evolves.

Reply to:

References:
- Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: Mo Zhou <lumin@debian.org>
- Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: Andreas Tille <andreas@an3as.eu>
- Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: Tzafrir Cohen <tzafrir@cohens.org.il>
- Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: Mo Zhou <lumin@debian.org>
- Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: Holger Levsen <holger@layer-acht.org>
- Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: Mo Zhou <lumin@debian.org>
- Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
  - From: Sam Hartman <hartmans@debian.org>

Prev by Date: Work-needing packages report for May 24, 2019
Next by Date: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
Previous by thread: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
Next by thread: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
Index(es):
- Date
- Thread