[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: We need a global decision about R data in binary format, and stick to it.



On 2013-08-05 14:13:15 +0100 (+0100), Ian Jackson wrote:
[...]
> The other is the assertion that this particular case involves a
> generated data table. If this is the case then the source package
> needs to contain the source code which generates the table - and,
> really, it should regenerate the table during the build.
[...]

No argument on the first, but the second sets a bad precedent if
interpreted strongly. For example I have a program which relies on a
fairly large set of correlative data requiring hours of expensive
computation to generate. In the source package I include the
original data on which the resulting tables are based and provide a
means to regenerate it on the fly at package build time, but disable
it by default so that it doesn't chew up build resources
unnecessarily.

Since I need to generate the correlation data for other (non-Debian)
users of the software anyway, I ship the generated files in the
source package too and just include them in the binary package
(along with instructions and tooling for the end user to be able to
build datasets they can use to override the default ones provided).
While my example is Python rather than R, I expect it's
representative of situations for many scientific tools. Perhaps some
guidance on when this tactic is or is not appropriate would be
beneficial.
-- 
{ PGP( 48F9961143495829 ); FINGER( fungi@cthulhu.yuggoth.org );
WWW( http://fungi.yuggoth.org/ ); IRC( fungi@irc.yuggoth.org#ccl );
WHOIS( STANL3-ARIN ); MUD( kinrui@katarsis.mudpy.org:6669 ); }


Reply to: