Re: Suggestion needed on test failures due to double arithmetics

To: Andrey Rahmatullin <wrar@debian.org>
Cc: Debian Mentors List <debian-mentors@lists.debian.org>
Subject: Re: Suggestion needed on test failures due to double arithmetics
From: Giulio Paci <giuliopaci@gmail.com>
Date: Thu, 25 Nov 2021 13:13:20 +0100
Message-id: <[🔎] CA+zRt5Hxdthw_U82fCYJptOAGUdG1qoX6-BpQF21QdEiizNRDw@mail.gmail.com>
In-reply-to: <[🔎] YZ9BISQDVPNqCH+W@belkar.wrar.name>
References: <[🔎] CA+zRt5GY6EvcU_KSKrcK3f4fvZKTxcQvC2bXv=3Bg92_yZpT2w@mail.gmail.com> <[🔎] YZ9BISQDVPNqCH+W@belkar.wrar.name>

On Thu, Nov 25, 2021 at 8:54 AM Andrey Rahmatullin <wrar@debian.org> wrote:
>
> On Wed, Nov 24, 2021 at 06:38:07PM +0100, Giulio Paci wrote:
> > Dear mentors,
> >   while updating SCTK package I enabled the execution of the test suite
> > which was previously disabled. The tests are working fine on x86_64
> > architecture, but a couple of them are failing on i386.
> > After investigation [1] I found out that tests are failing because they
> > rely on the assumptions that, when a and b have the same double value:
> > 1) "a < b" is false;
> > 2) "a - b" is 0.0.
> What do they actually test, why do they use these assumptions?

SCTK is a toolkit to evaluate speech recognition (and other related
tasks) tools performance.
These tools usually read audio streams and produce simple text files
containing the transcriptions and time information (relative to the
stream) to synchronize the transcription to the stream. These files
are very similar to video subtitles files.
The SCTK compares two textual files (usually one is a manually created
file and the other is created by an automatic tool) to score how
different these outputs are.
The tests are checking that SCTK produces the same score reports when
provided with the same input files.

The double values refer to timing information. The specific format,
known as CTM, stores information in seconds in decimals (e.g. "30.66"
seconds) from the beginning of the stream.
The failing tool reads this information into double variables and, to
simplify, it compares "up to when the timings in one file is less than
the timings in the other files. If it exceeds or is the same, it
checks the difference".

In this kind of application you are not usually going beyond what you
can store uncompressed on a filesystem in PCM. So, even assuming audio
samples of 1 byte, int64 should be a reasonable type to store timings
(in samples, rather then seconds). But I understand that doing so
would complicate the logic of the tool, especially since it is very
unlikely that math approximation would be an issue. To be honest I did
not expect the corner case above would fail since it is comparing a
value against another value that should just be the same.

I have uploaded simplified code that showcase the issue and some of
the instabilities [1]. The code seems to behave as if the last value
is different from the other 3, supposed equal values.

[1] https://pastebin.com/embed_js/T3g560UV

Bests,
Giulio

Reply to:

Follow-Ups:
- Re: Suggestion needed on test failures due to double arithmetics
  - From: Andrey Rahmatullin <wrar@debian.org>

References:
- Suggestion needed on test failures due to double arithmetics
  - From: Giulio Paci <giuliopaci@gmail.com>
- Re: Suggestion needed on test failures due to double arithmetics
  - From: Andrey Rahmatullin <wrar@debian.org>

Prev by Date: Bug#1000583: RFS: foomatic-filters/4.0.17-13 [RC] -- OpenPrinting printer support - filters
Next by Date: Re: Suggestion needed on test failures due to double arithmetics
Previous by thread: Re: Suggestion needed on test failures due to double arithmetics
Next by thread: Re: Suggestion needed on test failures due to double arithmetics
Index(es):
- Date
- Thread