[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#956324: Clustalo bus error on mipsel (Was: Bug#956324: python-biopython: FTBFS on mipsel)




On Apr 17, 2020, at 22:39, Andreas Tille <andreas@fam-tille.de> wrote:

Hi Matthew,

thanks a lot for your detailed investigation.

On Fri, Apr 17, 2020 at 04:28:23PM -0700, Matthew Fernandez wrote:
Program received signal SIGBUS, Bus error.
0x5556a1b8 in PairDistances (distmat=0x7fff278c, mseq=0x55692a30, pairdist_type=<optimized out>, bPercID=<optimized out>, istart=0, iend=3, jstart=0, jend=3, fdist_in=0x0, 
  fdist_out=0x0) at pair_dist.c:346
346                 NewProgress(&prProgress, LogGetFP(&rLog, LOG_INFO),

OK, let me try a little harder :)

   $ # enable debugging symbols and Address Sanitizer
   $ CFLAGS="-g -fsanitize=address" CXXFLAGS="-g -fsanitize=address" ./configure
   …
   $ make clean && make
   …
   $ ./src/clustalo -i debian/tests/biopython_testdata/f002 --guidetree-out temp_test.dnd -o temp_test.aln --outfmt clustal --force
   =================================================================
   ==30264==ERROR: AddressSanitizer: dynamic-stack-buffer-overflow on address 0x7ffcfcbf5784 at pc 0x5620f0aa478c bp 0x7ffcfcbf56c0 sp 0x7ffcfcbf56b8
   WRITE of size 4 at 0x7ffcfcbf5784 thread T0
       #0 0x5620f0aa478b in PairDistances /home/matthew/clustal-omega-1.2.4/src/clustal/pair_dist.c:336
       #1 0x5620f0a91d9f in AlignmentOrder /home/matthew/clustal-omega-1.2.4/src/clustal-omega.c:835
       #2 0x5620f0a95c04 in Align /home/matthew/clustal-omega-1.2.4/src/clustal-omega.c:1221
       #3 0x5620f0a90d76 in MyMain /home/matthew/clustal-omega-1.2.4/src/mymain.c:1192
       #4 0x5620f0a88ca2 in main /home/matthew/clustal-omega-1.2.4/src/main.cpp:469
       #5 0x7f3773d9009a in __libc_start_main ../csu/libc-start.c:308
       #6 0x5620f0a89ad9 in _start (/home/matthew/clustal-omega-1.2.4/src/clustalo+0x2dad9)

   Address 0x7ffcfcbf5784 is located in stack of thread T0
   SUMMARY: AddressSanitizer: dynamic-stack-buffer-overflow /home/matthew/clustal-omega-1.2.4/src/clustal/pair_dist.c:336 in PairDistances
   Shadow bytes around the buggy address:
     0x10001f976aa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     0x10001f976ab0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     0x10001f976ac0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     0x10001f976ad0: 00 00 00 00 00 00 00 00 00 00 00 00 ca ca ca ca
     0x10001f976ae0: 04 cb cb cb cb cb cb cb 00 00 00 00 ca ca ca ca
   =>0x10001f976af0:[04]cb cb cb cb cb cb cb 00 00 00 00 00 00 00 00
     0x10001f976b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     0x10001f976b10: f1 f1 f1 f1 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2
     0x10001f976b20: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 f2 f2 f2
     0x10001f976b30: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
     0x10001f976b40: 00 00 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 f2 f2
   Shadow byte legend (one shadow byte represents 8 application bytes):
     Addressable:           00
     Partially addressable: 01 02 03 04 05 06 07
     Heap left redzone:       fa
     Freed heap region:       fd
     Stack left redzone:      f1
     Stack mid redzone:       f2
     Stack right redzone:     f3
     Stack after return:      f5
     Stack use after scope:   f8
     Global redzone:          f9
     Global init order:       f6
     Poisoned by user:        f7
     Container overflow:      fc
     Array cookie:            ac
     Intra object redzone:    bb
     ASan internal:           fe
     Left alloca redzone:     ca
     Right alloca redzone:    cb
   ==30264==ABORTING

Looking at line 336 of pair_dist.c, it looks like the bound on the containing loop is wrong. So let’s try adjusting that:

   $ vim src/clustal/pair_dist.c
   $ git diff src/clustal/pair_dist.c
   diff --git a/src/clustal/pair_dist.c b/src/clustal/pair_dist.c
   index e6dbdc3..bb79e61 100644
   --- a/src/clustal/pair_dist.c
   +++ b/src/clustal/pair_dist.c
   @@ -321,7 +321,7 @@ PairDistances(symmatrix_t **distmat, mseq_t *mseq, int pairdist_type, bool bPerc

            /* FIXME: can get rid of iChunkStart, iChunkEnd now that we're using the arrays */
            iChunkStart = iend;
   -        for(iChunk = 0; iChunk <= iNumberOfThreads; iChunk++)
   +        for(iChunk = 0; iChunk < iNumberOfThreads; iChunk++)
            {
                iChunkEnd = iChunkStart;
                if (iChunk == iNumberOfThreads - 1){
   $ make
   …
   $ ./src/clustalo -i debian/tests/biopython_testdata/f002 --guidetree-out temp_test.dnd -o temp_test.aln --outfmt clustal --force
   =================================================================
   ==30601==ERROR: AddressSanitizer: global-buffer-overflow on address 0x561188847864 at pc 0x5611886da6e7 bp 0x7fffe6d77ef0 sp 0x7fffe6d77ee8
   READ of size 4 at 0x561188847864 thread T0
       #0 0x5611886da6e6 in FullAlignment::Build(HMM&, Hit&, char*) /home/matthew/clustal-omega-1.2.4/src/hhalign/hhfullalignment-C.h:250
       #1 0x5611886df3eb in HitList::PrintAlignments(char**, char**, char*, char*, HMM&, char*, char) /home/matthew/clustal-omega-1.2.4/src/hhalign/hhhitlist-C.h:197
       #2 0x5611886f379b in hhalign /home/matthew/clustal-omega-1.2.4/src/hhalign/hhalign.cpp:1211
       #3 0x56118863f848 in HHalignWrapper /home/matthew/clustal-omega-1.2.4/src/clustal/hhalign_wrapper.c:1342
       #4 0x561188637db1 in Align /home/matthew/clustal-omega-1.2.4/src/clustal-omega.c:1250
       #5 0x561188632d76 in MyMain /home/matthew/clustal-omega-1.2.4/src/mymain.c:1192
       #6 0x56118862aca2 in main /home/matthew/clustal-omega-1.2.4/src/main.cpp:469
       #7 0x7f6d857f109a in __libc_start_main ../csu/libc-start.c:308
       #8 0x56118862bad9 in _start (/home/matthew/clustal-omega-1.2.4/src/clustalo+0x2dad9)

   0x561188847864 is located 60 bytes to the left of global variable 'Sim' defined in 'hhdecl-C.h:234:7' (0x5611888478a0) of size 1764
   0x561188847864 is located 0 bytes to the right of global variable 'S' defined in 'hhdecl-C.h:235:7' (0x561188847180) of size 1764
   SUMMARY: AddressSanitizer: global-buffer-overflow /home/matthew/clustal-omega-1.2.4/src/hhalign/hhfullalignment-C.h:250 in FullAlignment::Build(HMM&, Hit&, char*)
   Shadow bytes around the buggy address:
     0x0ac2b1100eb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     0x0ac2b1100ec0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     0x0ac2b1100ed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     0x0ac2b1100ee0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     0x0ac2b1100ef0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   =>0x0ac2b1100f00: 00 00 00 00 00 00 00 00 00 00 00 00[04]f9 f9 f9
     0x0ac2b1100f10: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
     0x0ac2b1100f20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     0x0ac2b1100f30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     0x0ac2b1100f40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     0x0ac2b1100f50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   Shadow byte legend (one shadow byte represents 8 application bytes):
     Addressable:           00
     Partially addressable: 01 02 03 04 05 06 07
     Heap left redzone:       fa
     Freed heap region:       fd
     Stack left redzone:      f1
     Stack mid redzone:       f2
     Stack right redzone:     f3
     Stack after return:      f5
     Stack use after scope:   f8
     Global redzone:          f9
     Global init order:       f6
     Poisoned by user:        f7
     Container overflow:      fc
     Array cookie:            ac
     Intra object redzone:    bb
     ASan internal:           fe
     Left alloca redzone:     ca
     Right alloca redzone:    cb
   ==30601==ABORTING

Looking at line 250 of hhfullalignment-C.h, we can see it’s reading the array ‘S’ out of bounds here. Someone has helpfully left a debugging line below this, so let’s shuffle it ahead of the faulting access and remove the part where it is also performing the faulting access:

   $ vim src/hhalign/hhfullalignment-C.h
   $ git diff src/hhalign/hhfullalignment-C.h
   diff --git a/src/hhalign/hhfullalignment-C.h b/src/hhalign/hhfullalignment-C.h
   index 8f40fd1..fd759f9 100644
   --- a/src/hhalign/hhfullalignment-C.h
   +++ b/src/hhalign/hhfullalignment-C.h
   @@ -247,8 +247,8 @@ FullAlignment::Build(HMM& q, Hit& hit, char zcError[])
                char qc=qa->seq[  q.nfirst][ qa->m[  q.nfirst][hit.i[step]] ];
                char tc=ta->seq[hit.nfirst][ ta->m[hit.nfirst][hit.j[step]] ];
                if (qc==tc) identities++;  // count identical amino acids
   +            fprintf(stderr,"%3i %3i  %3i %3i  %3i %1c %1c %6.2f %6.2f %6.2f \n",step,hit.nsteps,hit.i[step],hit.j[step],int(state),qc,tc,score_sim,hit.P_posterior[step],hit.sum_of_probs); //DEBUG
                score_sim += S[(int)aa2i(qc)][(int)aa2i(tc)];
   -            //         fprintf(stderr,"%3i %3i  %3i %3i  %3i %1c %1c %6.2f %6.2f %6.2f %6.2f \n",step,hit.nsteps,hit.i[step],hit.j[step],int(state),qc,tc,S[(int)aa2i(qc)][(int)aa2i(tc)],score_sim,hit.P_posterior[step],hit.sum_of_probs); //DEBUG
            }
      }

   $ make
   …
   $ ./src/clustalo -i debian/tests/biopython_testdata/f002 --guidetree-out temp_test.dnd -o temp_test.aln --outfmt clustal —force
   …
    28 582  386 559   10 N - 127.25   0.01   2.91
   =================================================================
   ==30936==ERROR: AddressSanitizer: global-buffer-overflow on address 0x563d5b2258a4 at pc 0x563d5b0b79e8 bp 0x7ffd269e0e40 sp 0x7ffd269e0e38
   READ of size 4 at 0x563d5b2258a4 thread T0
       #0 0x563d5b0b79e7 in FullAlignment::Build(HMM&, Hit&, char*) /home/matthew/clustal-omega-1.2.4/src/hhalign/hhfullalignment-C.h:251
   …

So the values of qc and tc at this point are 'N' and '-', respectively. This results in an access to S[20][21], which is indeed out of range as S is a 21x21 array. To go further, I think I need some guidance from a domain expert. Is aa2i() ever expected to be called with a value that maps to GAP or ANY? Maybe S is actually meant to be a 22x22 array? Maybe the loop in hhfullalignment-C.h is meant to skip any iteration for which qc or tc map to GAP?

By the way, Andreas, I am doing this debugging on the upstream 1.2.4 release on an x86-64 machine so I still have no certainty that this is related to the root cause of your observed problem on MIPS.

Upstream is in the row of this investigation.  Its quite interesting
that the issue could also observed on amd64.  So probably this is a real
issue which is just exposed on mipsel.  I think just bumping the array
boundaries is cheap.  If there will be no response from upstream (or
somebody else who is comfortable with the algorithm which I'm not) I'll
check again on mipsel and in case it will work I'll upload.

Fair enough. While we’re discussing this, here’s another patch for a memory leak that fixes a typoed ifdef and a missing free():

    diff --git a/src/squid/clustal.c b/src/squid/clustal.c
    index 650004a..bb1fec8 100644
    --- a/src/squid/clustal.c
    +++ b/src/squid/clustal.c
    @@ -207,7 +207,7 @@ WriteClustal(FILE *fp, MSA *msa)
       int    namelen;              /* maximum name length used      */
       int    pos;                  /* position counter              */
     #ifdef CLUSTALO
    -  char  *buf;                  /* buffer for writing seq        */
    +  char  *buf = NULL;                   /* buffer for writing seq        */
       int    cpl = msa->alen<iWrap ? msa->alen+10 : (iWrap > 0 ? iWrap : 60);              /* char per line (< 64)          */
     #else
       char   buf[80];              /* buffer for writing seq        */
    @@ -410,8 +410,9 @@ WriteClustal(FILE *fp, MSA *msa)
     #endif
         }

    -#ifdef CLUSTAL
    +#ifdef CLUSTALO
       free(piResCnt); piResCnt = NULL;
    +  free(buf); buf = NULL;
     #endif

       return;


Reply to: