https://salsa.debian.org/med-team/catfishq
is ready for review+sponsoring.
Many thanks!
Steffen
Am 23.05.21 um 14:26 schrieb Steffen Möller:
Am 23.05.21 um 00:02 schrieb Nilesh Patra:On 5/23/21 2:54 AM, Andreas Tille wrote:On Sat, May 22, 2021 at 09:10:46AM +0200, Andreas Tille wrote:On Fri, May 21, 2021 at 09:26:48PM +0200, Steffen Möller wrote:If someone needs a stimulus to package something - cuteSV (https://github.com/tjiangHIT/cuteSV), please.I gave it a kickstart while sitting in the train (which will be offline soon). Everybody can feel free to add own ID to Uploaders and finalise. There is no build time test running now and no autopkgtest. Data to test / benchmark are included - so this should be feasible.I just packaged the precondition python3-cigar and uploaded to new.I wrote a sample autopkgtest for cigar (basically used the same thingy in the readme) and did a few minor changes. I have no idea about autopkgtests for cutesv - I lack the pre-requistites here and probably only Steffen can help here. PS: Please check and upload vbz-compression whenever you have time (after two days as you wrote would be fine anyway) I'll be inactive/be away for a couple of days (wish to take a break :-))Thank you both, you are amazing! CuteSV is part of the https://github.com/nanoporetech/pipeline-structural-variation that I plan to run when first Nanopore reads surface in my inbox next week. You compare against a reference genome to run this, which we do not have in Debian, so, yes, we should think of some tests, but we should also find a way to perform such tests for other packages. This kind of leads to a follow-up question - we could have a "test package" that offers a fraction of the human genome, like the Y chromosome and a second - chromosome 22 maybe. That would not be too big and we can test with it. It would also be a bit meaningless, though. And for testing we do not need anything to be human (or real) in the first place. We could generate our own mini-genome or instead (which I would prefer) go for something small that is real, like yeast (for eukaryotes), E. coli (for bacteria), we ignore archea, and then .. there is https://www.ncbi.nlm.nih.gov/nuccore/CP014940 , i.e. that data fr C. Venter's https://www.jcvi.org/research/first-minimal-synthetic-bacterial-cell, which may be interesting to be distributed with an Open Source distribution. While there is always something novel found also for these genomes for which the genomic DNA is long known, we do not much harm by distributing such genomes. Professional researchers will update them, anyway. The same holds for the human genome, but it is a bit larger and we should possibly make our experiences with the smaller genomes, first. I'll let this think in for another while and then likely extend getData to deal with these genomes and auto-generate native Debian packages with it. Ok - back to some real work and I'll have a closer look at that pipeline.I just went through their snakemakefile. To get this running, we need
* catfishq https://github.com/philres/catfishq
* lra (long read aligner) https://github.com/ChaissonLab/LRA
* truvari https://github.com/spiralgenetics/truvari/
* add the scripts to libvcflib1/new package vcflib-scripts
Catfishq looks straight-forward, I'll just go and adress that. LRA is a meson build with "subprojects" that wrap other bits. Truvari drags in a few python packages that in part we do not have, yet . Have added that info to the Nanopore tab on https://docs.google.com/spreadsheets/d/1tApLhVqxRZ2VOuMH_aPUgFENQJfbLlB_PFH_Ah_q7hM/edit#gid=1806578173
Best,
Steffen