[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Reproducibility



On Fri, Apr 30, 2010 at 07:07:21AM -0400, Michael Hanke wrote:
> > This nice abstract inspired me to think about reproducibility of
> > program runs. If one runs e.g. Debian unstable the OS code which can
> > potentially affect the results of calculations can change almost
> > daily. Reproducing results later can be close to impossible unless
> > versions of all the related libraries etc. are written down somewhere.
> 
> This is not just a potential problem -- we have seen it happen already.
> Part of the problem is that in Debian we prefer dynamic linking to
> up-to-date shared libs from separate packages -- instead of statically
> linking to ancient versions with known behavior (for good reasons of
> course).

I can confirm that this is actually the reason why at Sanger Institute
(even if there are three DDs working) plain Debian (and specifically the
Debian Med packages) is not used.  The requirement of the scientists is
to stick to a very specific version of the packages (not necessary those
which are part of a stable Debian release) and some labs use different
versions than other labs.
 
> IMHO better than relying on a snapshot of OS and a particular software
> state to get constant results, projects should have comprehensive
> regression tests that ensure proper behavior.

In theory this is probably right but in practice it needs extra manpower
which I doubt will be spend on problems like this.

> The problem is, however,
> that we cannot run then during package build time, since they tend to
> require large datasets and run for many hours. Therefore users need to
> do that, but nobody does it.

Yes, that's the problem.

Kind regards

        Andreas. 

-- 
http://fam-tille.de


Reply to: