[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: We need a global decision about R data in binary format, and stick to it.



Paul Tagliamonte writes ("Re: We need a global decision about R data in binary format, and stick to it."):
> On Mon, Aug 05, 2013 at 09:57:35AM +0900, Charles Plessy wrote:
> > it is the common practice in upstream R packages to store data in binary
> > objects.  Those objects can be modified with R, and exported into various
> > formats.  The Debian archive if full of them.
> 
> This is not unlike a Python pickle.
> 
> However, even more to the point, with *this* package, that was a
> *generated data table*. These *generated* values are clearly not prefered
> form of modification. I asked the uploader to point to where they came
> from. I don't think this is unfair.

We need to separate these two issues.

One is the file format question.  It doesn't seem to me that there is
anything wrong with a binary format as the preferred form for
modification, in principle.  For a file which is typically edited
using R, including by upstream when they what to edit it, then there
is no problem.

The other is the assertion that this particular case involves a
generated data table.  If this is the case then the source package
needs to contain the source code which generates the table - and,
really, it should regenerate the table during the build.  (The source
might be in the form of another R binary object.)

(Of course there is a third issue: it is probably not the best
engineering decision to use a binary save format rather than text
source code.  But that's not something the Debian maintainer
necessarily gets to choose and it's not a reason for an ftpmaster
reject.)

> > The question asked by Paul is a recurrent question that comes each
> > time the FTP trainees rotate (basically once per release cycle,
> > because during the Freeze the FTP trainees find other exciting
> > tasks to do, and then do not seem to have much time to process NEW
> > anymore).
> 
> This must mean many people who care deeply about this topic see this as an
> issue.

I don't think this is a helpful response to someone who is raising
what they see as a systematic problem.

Paul, would it be possible to update the ftpmaster assistant reference
materials to discuss R's binary files ?

Ian.


Reply to: