[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Is tabular data in binary format acceptable for Debian ?



On Wed, 20 Jan 2010, Jean-Christophe Dubacq wrote:
> Charles Plessy a écrit :
> > Is tabular data in a binary format that can be read, written,
> > modified and exported using free software acceptable for Debian,
> > or shall we contact the upstream author to check if he used an
> > intermediate format (be it text, or binary like .odt or .xls) and
> > require the addition of this file to the source, or shall we
> > provide a text export?

It depends on the precise nature of the data. It is quite easy to
produce Rdata files which are not the prefered form for modification.
For example, the following temp.Rdata would not be the prefered form
for modification:

temp <- data.frame(read.table(file="data_file_not_distributed.txt"))
model.lm <- lm(foo~bar,temp)
save(model.lm,file="temp.Rdata")

but this might be:

save(temp,model.lm,file="temp.Rdata")

especially when coupled with the above code and code to regenerate
data_file_not_distributed.txt.

On a more practical note, I'm really surprised that upstream is
distributing the Rdata directly in the source, as it's really a pain
to track modifications to them in any kind of VCS. If it were an R
module that I was packaging, I would strongly suggest that upstream
distribute the code needed to generate the Rdata directly from text
files which are more easily tracked (and *patched*).

> I had a question similar to that for a program which comes bounded
> with a trained neural network. There are files with raw weights. It
> is possible to retrain on build the program, but it would take a
> very long time, and the resulting network wouldn't even be the same.
> What is the "source" in this case?

The training set used to generate the weights for the neural network
is the source.

You don't necessarily need to regenerate the weights, but it should be
possible for an end user to do so. [With obvious caveats about things
which involve RNGs and heuristic solutions, where even the original
developer isn't able to regenerate the exact same weights.] In fact,
the whole question of rebuilding things from source is just a red
hering.

All of these questions are pretty easy to answer if you think about
whether upstream is in a privileged position with regards to
modification by dint of information they have access to which could be
distributed digitally. If upstream is witholding information that is
in a digital form that gives them an advantage in modification,
they're often not providing the source.


Don Armstrong

-- 
Of course, there are cases where only a rare individual will have the
vision to perceive a system which governs many people's lives; a
system which had never before even been recognized as a system; then
such people often devote their lives to convincing other people that
the system really is there and that it aught to be exited from. 
 -- Douglas R. Hofstadter _Gödel Escher Bach. Eternal Golden Braid_

http://www.donarmstrong.com              http://rzlab.ucr.edu


Reply to: