Re: CuteSV (Was: PyEnsembl - how does that help us?)

To: debian-med@lists.debian.org
Subject: Re: CuteSV (Was: PyEnsembl - how does that help us?)
From: Nilesh Patra <nilesh@nileshpatra.info>
Date: Sun, 23 May 2021 19:45:38 +0530
Message-id: <[🔎] 2ff5658c-96b4-f29c-6d7b-2f1257755e25@nileshpatra.info>
In-reply-to: <[🔎] 736ba102-768c-11e0-e459-88636722b418@gmx.de>
References: <[🔎] 58d83e85-81e1-f6a2-da0a-b7eaff743d9e@gmx.de> <[🔎] CAJN1928EZRKmW-EOfaE8ejgLf-Cv11ux4mx_asZgYqkPX7pZMA@mail.gmail.com> <[🔎] 4b46c10b-d952-f37f-b113-25321d6efc06@gmx.de> <[🔎] 25d02178-979e-da62-2a74-9d6e40feaa5d@nileshpatra.info> <[🔎] 4382f7b3-9ce1-924b-0d7e-901a5feeb9c0@gmx.de> <[🔎] d523d0d0-a272-5c09-e71b-0dd1bed243ea@nileshpatra.info> <[🔎] db540cba-c2a8-8d45-fd48-9e321ded5945@gmx.de> <[🔎] af1d274a-a0db-6e58-8ca8-c20b55d3e5cc@nileshpatra.info> <[🔎] 243e5b57-1c3e-8432-46bf-1a34381b3370@gmx.de> <[🔎] 20210522071046.GC8962@an3as.eu> <[🔎] 20210522212431.GF8962@an3as.eu> <[🔎] fc9a8fa8-dc98-f422-210d-6dc2e1a93d75@nileshpatra.info> <[🔎] 736ba102-768c-11e0-e459-88636722b418@gmx.de>


On 5/23/21 5:56 PM, Steffen Möller wrote:
> 
> Am 23.05.21 um 00:02 schrieb Nilesh Patra:
>>
>> On 5/23/21 2:54 AM, Andreas Tille wrote:
>>> On Sat, May 22, 2021 at 09:10:46AM +0200, Andreas Tille wrote:
>>>> On Fri, May 21, 2021 at 09:26:48PM +0200, Steffen Möller wrote:
>>>>> If someone needs a stimulus to package something - cuteSV
>>>>> (https://github.com/tjiangHIT/cuteSV), please.
>>>> I gave it a kickstart while sitting in the train (which will be
>>>> offline soon).  Everybody can feel free to add own ID to Uploaders
>>>> and finalise.  There is no build time test running now and no
>>>> autopkgtest.  Data to test / benchmark are included - so this
>>>> should be feasible.
>>> I just packaged the precondition python3-cigar and uploaded to new.
>> I wrote a sample autopkgtest for cigar (basically used the same thingy in the readme)
>> and did a few minor changes.
>>
>> I have no idea about autopkgtests for cutesv - I lack the pre-requistites here and probably only Steffen can help here.
>>
>> PS: Please check and upload vbz-compression whenever you have time (after two days as you wrote would be fine anyway)
>> I'll be inactive/be away for a couple of days (wish to take a break :-))
> 
> Thank you both, you are amazing!
> 
> CuteSV is part of the
> https://github.com/nanoporetech/pipeline-structural-variation that I
> plan to run when first Nanopore reads surface in my inbox next week. You
> compare against a reference genome to run this, which we do not have in
> Debian, so, yes, we should think of some tests, but we should also find
> a way to perform such tests for other packages.
> 
> This kind of leads to a follow-up question - we could have a "test
> package" that offers a fraction of the human genome, like the Y
> chromosome and a second - chromosome 22 maybe. That would not be too big
> and we can test with it. It would also be a bit meaningless, though. And
> for testing we do not need anything to be human (or real) in the first
> place. We could generate our own mini-genome or instead (which I would
> prefer) go for something small that is real, like yeast (for
> eukaryotes), E. coli (for bacteria), we ignore archea, and then .. there
> is https://www.ncbi.nlm.nih.gov/nuccore/CP014940 , i.e. that data fr C.
> Venter's
> https://www.jcvi.org/research/first-minimal-synthetic-bacterial-cell,
> which may be interesting to be distributed with an Open Source
> distribution.


Sounds good, but please take these factors in consideration:

* The debci machines have typically have space of ~40 GiB. If the data you refer here
is even a few GiB, all packages using it for tests will turn into RC bugs

* If the size of data is _not_ in line of an RC bug, but still *huge* - and used in large number of tests, it'll be a pain for us to maintain
it ourselves and also not the best for end users who might want to download test data

I had listed more reasons in a previous mail when a discussion regarding "centralised test data" was going on,
please take a look here too:

https://lists.debian.org/debian-med/2020/09/msg00365.html

> While there is always something novel found also for these genomes for
> which the genomic DNA is long known, we do not much harm by distributing
> such genomes. Professional researchers will update them, anyway. The
> same holds for the human genome, but it is a bit larger and we should
> possibly make our experiences with the smaller genomes, first.

If smaller genome sizes, analysis of which renders output sequences which aren't too large in size, it can be done.

> I'll let this think in for another while and then likely extend getData
> to deal with these genomes and auto-generate native Debian packages with it.
>
> Ok - back to some real work and I'll have a closer look at that pipeline.

* Thumbs up *

Nilesh

Attachment: signature.asc
Description: OpenPGP digital signature

Reply to:

References:
- Re: PyEnsembl - how does that help us?
  - From: Steffen Möller <steffen_moeller@gmx.de>
- Re: PyEnsembl - how does that help us?
  - From: Nilesh Patra <nilesh@debian.org>
- Re: PyEnsembl - how does that help us?
  - From: Steffen Möller <steffen_moeller@gmx.de>
- Re: PyEnsembl - how does that help us?
  - From: Nilesh Patra <nilesh@nileshpatra.info>
- Re: PyEnsembl - how does that help us?
  - From: Steffen Möller <steffen_moeller@gmx.de>
- Re: PyEnsembl - how does that help us?
  - From: Nilesh Patra <nilesh@nileshpatra.info>
- Re: PyEnsembl - how does that help us?
  - From: Steffen Möller <steffen_moeller@gmx.de>
- Re: PyEnsembl - how does that help us?
  - From: Nilesh Patra <nilesh@nileshpatra.info>
- Re: PyEnsembl - how does that help us?
  - From: Steffen Möller <steffen_moeller@gmx.de>
- CuteSV (Was: PyEnsembl - how does that help us?)
  - From: Andreas Tille <andreas@an3as.eu>
- Re: CuteSV (Was: PyEnsembl - how does that help us?)
  - From: Andreas Tille <andreas@an3as.eu>
- Re: CuteSV (Was: PyEnsembl - how does that help us?)
  - From: Nilesh Patra <nilesh@nileshpatra.info>
- Re: CuteSV (Was: PyEnsembl - how does that help us?)
  - From: Steffen Möller <steffen_moeller@gmx.de>

Prev by Date: Re: vcflib does not install scripts - missing bgziptabix
Next by Date: Re: CuteSV (Was: PyEnsembl - how does that help us?)
Previous by thread: Re: CuteSV (Was: PyEnsembl - how does that help us?)
Next by thread: Re: CuteSV (Was: PyEnsembl - how does that help us?)
Index(es):
- Date
- Thread