[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: A common test-data package for genome assemblers



This is a great idea!

On Wed, Jul 13, 2016 at 11:24 AM, Afif Elghraoui <afif@debian.org> wrote:
Hi, all,

I've had a couple packages that indicate the availability of data
outside of the source distribution that can be used to try out the
software (and make sure that it actually runs). I didn't think it was a
good idea to bundle the data in with the actual package since it doesn't
change between releases and would take up too much space on the archive
if it was bundled with every upstream tarball.

For example, at <http://canu.readthedocs.io/en/stable/quick-start.html>,
there are a few reduced datasets that can be used to run assemblies for
PacBio and Nanopore sequencing data. Those files can also be used for
tests of the sprai package, and possibly also for other long-read genome
assemblers. There's also the option of packaging the Assemblathon data
for this purpose, or using simulators to generate datasets for testing.

Does anyone have suggestions or thoughts on this?

regards
Afif

--
Afif Elghraoui | عفيف الغراوي
http://afif.ghraoui.name




--
Michael R. Crusoe
Community Engineer & Co-founder
Common Workflow Language project
https://impactstory.org/u/0000-0002-2961-9670
michael.crusoe@gmail.com
+32 (0) 2 808 25 58
+1 480 627 9108

Reply to: