[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Preferred form of modification for binary data used in unit testing?



On 2020-07-15 09:45, Philipp Hahn wrote:
> if a *previous* version of a software generated a *buggy* binary
> database, that bug got fixed in a *newer* version and also some
> *recovery* mechanism was added to allow reading that broken format
> *once*, but there is no code the write the *broken* file again. For
> *unit testing* the upstream developers added an *example* of such a
> broken database to their test data.
> What's the preferred form of modification for that data set?
> 
> * Should I include a copy of the *broken code* to generate that data?
> * Declare that there in no preferred form for modification, as a
> "open-save"-cycle with the current code will not re-create the bit
> idencial file again.
> * Remove the test data because it is not DFSG conformant and hope the
> Debian build will never break the recovery code.> * Include instructions on how to re-build the broken version and give
> instructions on how to maybe rebuild a similar broken file.

Personally, I would do nothing at all. At most, I would choose the last
of the above options (include instructions).

This is about the payload to a particular decoding unit test. It's a
common pattern to generate such payloads without storing the original
source or even intermediate steps -- which, unless I'm mistaken, would
imply that the final result has become the preferred form for
modification. The expectation is simply for a particular chunk of data
to produce a particular output.

I think it is reasonable to assume that upstream generated the broken
file with the old code, implemented the unit test, and discarded the
broken code. So given the current (shipped) version of the software,
even upstream couldn't recreate the broken file.

Generally speaking, I think it's a mistake to apply the question of
"preferred form for modification" to unit test payloads. Unit tests are
purely about functionality. The original source to a payload is an
arbitrary choice (possibly even randomly generated), and could be
replaced with any other appropriate arbitrary choice at no detriment to
the software or the user.


Reply to: