[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#721801: [devteam-bioc] Precomputed results in GenomicRanges [Was: r-bioc-genomicranges_1.12.4-1_amd64.changes REJECTED]



On 10/16/2013 01:37 AM, Andreas Tille wrote:
Hi Martin,

On Sat, Oct 12, 2013 at 09:59:35AM -0700, Martin Morgan wrote:
On 10/12/2013 02:59 AM, Maintainer wrote:
Hi,

the Debian Med team tries to package several parts of BioConductor.
When trying to upload GenomicRanges our ftpmaster criticised that the
source contains some precomputed results inside the documentation which
is in conflict with our policy which requires the source for all binary
data.  There could be different solutions for this:

   1. If you consider the files
          GenomicRanges/inst/doc/precomputed_results/*.rda
      as not very important for the user documentation and it might be
      sufficient to download the files from somewhere else.

   2. Provide a recipe to reprodce the precomputed results we could
      use in the package building process to recreate the data.

May be there are other solutions but these come to my mind for the
moment.

Any hint what we should do?

Andreas -- you've brought this topic up before; you've provided guidance at

   https://wiki.debian.org/GNU_R

Sure, I know and I really hoped that this means would be convincible
enough to our ftpmasters - but unfortunately it did not (see link to
ftpmaster decision on this page).  We kept on dicussing the issue
with ftpmaster and they just came up with their stronger than hoped
requirement.

Basically, these are serialized R objects, so their content is
transparent to users in the same way that a binary image is visible
(and useful) to a user.

I'll try uploading with the other explanation given by Hervé Pagès and
hope this will pass.

Sorry for bothering you about this and thanks for your patience

For precomputed_results above, it looks like these could be generated by a script, but the specific results depend on a web service query and the web service changes from time to time. So the script will become out-of-date, creating data that are no longer consistent with the illustrative puruposes of the vignette. Also, the time cost of generating data is not consistent with our (nightly) build process; we will not generate this data on the fly, and it would be a mistake for your release process to generate data different from the data used in our release. These (expense of computation, consistency of external data sources) are typical reasons.

When the 'affy' maintainer recieves one of these emails, and the email mentions three data sets, and the three data sets are documented in the man page as data sets from an experiment (e.g., ?SpikeIn), what is one supposed to do? Or rather, why is he being contacted in the first place?

From a non-technical perspective: (1) It's presumptuous to suggest that the data files are not important for user documentation; if they where not important why would the author have gone to the trouble to include them in the first place? (2) If you are going to contact our maintainers, then please let me know about the extent of the contact and the intention; I would rather have a discussion on our developer mailing list than have each maintainer wondering how to react.

Martin


      Andreas.



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793


Reply to: