[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RoSA needs sambamba for testing which needs a patch to htslib to build



On 4 Jul 2020, at 16:13, Andreas Tille <andreas@an3as.eu> wrote:
On Sat, Jul 04, 2020 at 12:53:00PM +0200, Steffen Möller wrote:
please allow that I bring some attention to this bug report. Are there
any objections to adopt the patch in guix to htslib also for Debian?
This just removes the "static" from cram_to_bam and adds its declaration
to a header file.

if you think that's helpful and upstream has no problems with this
I'm fine with that patch.  I've kept John Marshall who once helped
to have a clean htslib - comments are welcome.

As you have asked, here are some thoughts. I see this has also been opened for discussion upstream at <https://github.com/samtools/htslib/issues/1098>. I think it's fair to say that upstream is expressing that they have problems with this patch :-)

The suggested patch does not add the function to the public CRAM API; to do that would require a suitable declaration in htslib/cram.h instead. It would be appropriate to make changes like this in conjunction with the upstream maintainers, so that Debian does not get itself into a position of making promises to users that are not supported by upstream HTSlib (i.e., where the upstream maintainers provide an equivalent solution using different API functions and do not support e.g. this cram_to_bam() solution).

The cram_to_bam() function is used in one place in sambamba, in CramSliceReader, which appears to be about reading CRAM files. It would be useful if someone familiar with sambamba could say what this code is all about. Could it be recoded to use sam_read1() or iterators, or does it provide access patterns that cannot currently be expressed using the general purpose HTSlib API? This comment from the sambamba maintainer [1] suggests that in fact sambamba could be recoded without using this function and so this issue would disappear in due course.

OTOH if there is a problem needing to be solved here, then cram_get_bam_seq() (which is a simpler wrapper around cram_to_bam()) may be a better candidate for adding to the public API. Would that function suffice for sambamba's needs? At present, htslib/cram.h does not really provide an interface for reading records from CRAM files, so there would be some work for the HTSlib maintainers to do to decide how best to provide such abilities. (If in fact doing so is warranted, as most use cases are satisfied by the general sam_read1()/etc API rather than needing CRAM-specific functions.)

    John

[1] https://github.com/biod/sambamba/issues/425#issuecomment-571213429 (hat tip Rob Davies)

Reply to: