Re: Preferred form of modification for binary data used in unit testing?

To: Christian Kastner <ckk@debian.org>, debian-devel@lists.debian.org
Subject: Re: Preferred form of modification for binary data used in unit testing?
From: Johannes Schauer <josch@debian.org>
Date: Thu, 16 Jul 2020 17:00:59 +0200
Message-id: <[🔎] 159491165979.1819.1700455705488568264@localhost>
Mail-followup-to: Christian Kastner <ckk@debian.org>, debian-devel@lists.debian.org
In-reply-to: <[🔎] 71190c85-d4f7-2828-d3c8-17835dbdb141@debian.org>
References: <[🔎] a39eaca0-ab6d-f35c-b5b9-b5d01b280f92@pmhahn.de> <[🔎] c050432d-96c5-212e-70f0-ef3722827f29@debian.org> <[🔎] 1594896828.9960.0@onenetbeyond.org> <[🔎] 71190c85-d4f7-2828-d3c8-17835dbdb141@debian.org>

Hi,

Quoting Christian Kastner (2020-07-16 14:08:34)
> On 2020-07-16 12:53, Pirate Praveen wrote:
> >> Generally speaking, I think it's a mistake to apply the question of
> >> "preferred form for modification" to unit test payloads. Unit tests are
> >> purely about functionality. The original source to a payload is an
> >> arbitrary choice (possibly even randomly generated), and could be
> >> replaced with any other appropriate arbitrary choice at no detriment to
> >> the software or the user.
> > I think this needs to be clearly documented in policy. I don't think
> > this interpretation is generally accepted. I have seen many cases where
> > tests are disabled for this reason.
> Perhaps I spoke too generally. For example, I can see, as one of
> probably many counter-examples, the case where the input is not
> completely arbitrary (eg: input is a captured stream).
> 
> But to take the other extreme, using completely arbitrary data, as an
> example: say my code implements a ROT13 function and I create a test for
> it using a blob of random data as well as the expected output.
> 
> That random data was generated somehow, eg: using Python's random
> module, and could therefore be regenerated given the correct program and
> seed. However, I did not include the code to generate that data.
> 
> Would we really reasonably expect anyone to act upon that random blob in
> any way?

I have another data point with one of my packages (genext2fs) where I made a
contribution to upstream. Their unit tests execute the program with some input
and a given set of parameters and then check that the md5sum of the created
ext2 filesystem image matches the expected value. Without thinking, I added the
following into their test script:

H4sIAAAAAAAAA+3WTW6DMBAF4Fn3FD6B8fj3PKAqahQSSwSk9vY1uKssGiJliFretzECJAYeY1s3JM4UKYRlLG7H5ZhdTIHZGevK+ZTYkgrypRFN17EdlKIh5/G3++5d/6N004qbA47er8/fWVduV2aLD7D7/A85C88Ba/ufA/sQIhk25VdA/2+h5t+1gx4/pd7vfv+Hm/ytmfNH/8vr+ql7e3UR8DK6uUx9L/uMtev/3P8p+KX/oyHlZMuqntX/9T34Z9yk9Gco8//xkGWf8Uj+Mbpl/Y+JVJQtq9r5/K+bj3Z474+Xk9wG4JH86/rvyzxAirfYnOw+/+vXWTb+uv9PaV3+JfiSv/WOlJVPf/f5AwAAAAAAAAAAAMD/9A0cPbO/ACgAAA==

This is a base64 encoded gzipped tarball with a few test files in it. I
generated it using GNU tar but since I found it likely that a GNU tar version
in the future (or the past) will produce a slightly different tarball and
because I needed some fixed input without different output on systems without
GNU tar (like BSD or MacOS) or on older systems or on future systems, I just
dumped that binary blob into the upstream software. In the meantime, that
binary blob is even in the Debian package:

https://sources.debian.org/src/genext2fs/1.5.0-1/test.sh/#L89

The curious thing for me personally is, that I didn't feel bad about this at
all and at no point from writing the code up to me packaging and uploading the
Debian package containing the blob, I thought even twice about whether this is
DFSG compliant or not. Only now after having read this thread I start wondering
whether I have actually created an RC bug myself. Did I? I love the principles
of the DFSG and it really surprises me that despite my love for these freedoms
I didn't think twice about including that binary blob instead of generating it
on the fly. Was my mind fooled by how short the blob is? A perl script
generating the tarball such that it's bit-by-bit identical across all platforms
would be longer than this blob.

What do you guys think? Should I put work into writing a script which produces
above binary blob as part of the test suite to avoid having my package be RC
buggy? I would love to get some guidance.

Thanks!

cheers, josch

Attachment: signature.asc
Description: signature

Reply to:

Follow-Ups:
- Re: Preferred form of modification for binary data used in unit testing?
  - From: Thomas Goirand <zigo@debian.org>

References:
- Preferred form of modification for binary data used in unit testing?
  - From: Philipp Hahn <pmhahn@pmhahn.de>
- Re: Preferred form of modification for binary data used in unit testing?
  - From: Christian Kastner <ckk@debian.org>
- Re: Preferred form of modification for binary data used in unit testing?
  - From: Pirate Praveen <praveen@onenetbeyond.org>
- Re: Preferred form of modification for binary data used in unit testing?
  - From: Christian Kastner <ckk@debian.org>

Prev by Date: Re: Preferred form of modification for binary data used in unit testing?
Next by Date: Packaging minetest mods
Previous by thread: Re: Preferred form of modification for binary data used in unit testing?
Next by thread: Re: Preferred form of modification for binary data used in unit testing?
Index(es):
- Date
- Thread