[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

bwa Segmentation fault issue



Hello All, 

I've been investigating an bwa issue #108[1]. I used valgrind and find two memory issues:

1) heap block overrun 
==17130== Invalid write of size 4
==17130==    at 0x13753B: ksw_extend2 (ksw.c:395)
==17130==    by 0x137BC5: ksw_extend (ksw.c:483)
==17130==    by 0x124FFE: bsw2_extend_left (bwtsw2_aux.c:133)
==17130==    by 0x125C3A: bsw2_aln1_core (bwtsw2_aux.c:283)
==17130==    by 0x1278BC: bsw2_aln_core (bwtsw2_aux.c:598)
==17130==    by 0x127E90: worker (bwtsw2_aux.c:660)
==17130==    by 0x535D493: start_thread (pthread_create.c:333)

It was an easy fix, i just allocated more memory for eh. As we always initiate eh[0] and eh[1] it seemed to be sensible to allocate at least 2 for eh. This issue didn't caused segmentation fault, 
but it still seemed to be right to fix it, so I created a patch fix_heap_block_overrun.

2) negative argument for malloc size
==10017== Thread 13:
==10017== Argument 'size' of function malloc has a fishy (possibly negative) value: -9223372036854775657
==10017==    at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==10017==    by 0x12E34A: bsw2_pair1 (bwtsw2_pair.c:123)
==10017==    by 0x12EC37: bsw2_pair (bwtsw2_pair.c:193)
==10017==    by 0x127B4E: bsw2_aln_core (bwtsw2_aux.c:621)
==10017==    by 0x127E90: worker (bwtsw2_aux.c:660)
==10017==    by 0x535D493: start_thread (pthread_create.c:333)

Thanks to debug output I knew that this happens when l_mseq = 149, end = 0, beg = 1

Considering the line 120 in bwtsw2_pair.c : 

if(end - beg < l_mseq) return;

the malloc on line 123 shouldn't happen at all when end=0, beg = 1, l_mseq=149.

Further investigation showed that some incorrect implicit type casting happens at line 120.
It seems that left part of comparison casts -1 to unsigned type. Although I don't have a clue why, as l_mseq is int, and both end and beg are int64_t. 

For now I created a patch with a workaround for this issue:

if(end < beg || end - beg < l_mseq) return;

which resolved upstream issue #108, but I'm still eager to find out why such type casting is happening. If someone with more experience with type casting in C have any ideas, please feel free to share.

Regards, Nadiya Sitdykova

[1] https://github.com/lh3/bwa/issues/108

Reply to: