[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: A common test-data package for genome assemblers



Le Wed, Jul 13, 2016 at 08:24:23AM -0700, Afif Elghraoui a écrit :
> 
> I've had a couple packages that indicate the availability of data
> outside of the source distribution that can be used to try out the
> software (and make sure that it actually runs). I didn't think it was a
> good idea to bundle the data in with the actual package since it doesn't
> change between releases and would take up too much space on the archive
> if it was bundled with every upstream tarball.
> 
> For example, at <http://canu.readthedocs.io/en/stable/quick-start.html>,
> there are a few reduced datasets that can be used to run assemblies for
> PacBio and Nanopore sequencing data. Those files can also be used for
> tests of the sprai package, and possibly also for other long-read genome
> assemblers. There's also the option of packaging the Assemblathon data
> for this purpose, or using simulators to generate datasets for testing.
> 
> Does anyone have suggestions or thoughts on this?

Hi Afif,

yes, this is an excellent idea.  Discussions about data packages
spark from time to time (see https://wiki.debian.org/DataPackages
for instance), and it would be exciting to see this happening in
one way or the other !

Charles

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


Reply to: