As you have asked, here are some thoughts. I see this has also been opened for discussion upstream at <https://github.com/samtools/htslib/issues/1098>. I think it's fair to
say that upstream is expressing that they have problems with this patch :-)
The suggested patch does not add the function to the public CRAM API; to do that would require a suitable declaration in htslib/cram.h instead. It would be appropriate to make changes like this in conjunction with the upstream maintainers, so
that Debian does not get itself into a position of making promises to users that are not supported by upstream HTSlib (i.e., where the upstream maintainers provide an equivalent solution using different API functions and do not support e.g. this cram_to_bam()
solution).
The cram_to_bam() function is used in one place in sambamba, in CramSliceReader, which appears to be about reading CRAM files. It would be useful if someone familiar with sambamba could say what this code is all about. Could it be recoded to use
sam_read1() or iterators, or does it provide access patterns that cannot currently be expressed using the general purpose HTSlib API? This comment from the sambamba maintainer [1] suggests that in fact sambamba could be recoded without using this function
and so this issue would disappear in due course.
OTOH if there is a problem needing to be solved here, then cram_get_bam_seq() (which is a simpler wrapper around cram_to_bam()) may be a better candidate for adding to the public API. Would that function suffice for sambamba's needs? At present,
htslib/cram.h does not really provide an interface for reading records from CRAM files, so there would be some work for the HTSlib maintainers to do to decide how best to provide such abilities. (If in fact doing so is warranted, as most use cases are satisfied
by the general sam_read1()/etc API rather than needing CRAM-specific functions.)
John
[1] https://github.com/biod/sambamba/issues/425#issuecomment-571213429 (hat tip Rob Davies)
|