[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

BLAST+ speed & build issues



Hi Olivier and Aaron,

I've been playing around with BLAST+, trying to tackle one fairly simple
but unimportant issue and one more complex and problematic issue.

The easy one first:

In the build log (on Launchpad) I see that during the test phase of the
build there are various attempts to connect to the NCBI servers -
starting at:

======================================================================
blast_services_unit_test
======================================================================

Running 23 test cases...
Error: (311.22) SOCK#1000[?]: [SOCK::Connect]  Failed SOCK_gethostbyname("www.ncbi.nlm.nih.gov")
Error: (303.7) [URL_Connect]  Socket connect to www.ncbi.nlm.nih.gov:80 failed: Unknown
Error: (311.22) SOCK#2000[?]: [SOCK::Connect]  Failed SOCK_gethostbyname("www.ncbi.nlm.nih.gov")
Error: (303.7) [URL_Connect]  Socket connect to www.ncbi.nlm.nih.gov:80 failed: Unknown
Error: (311.22) SOCK#3000[?]: [SOCK::Connect]  Failed SOCK_gethostbyname("www.ncbi.nlm.nih.gov")
Error: (303.7) [URL_Connect]  Socket connect to www.ncbi.nlm.nih.gov:80 failed: Unknown
Error: (310.5) [blast4]  Cannot locate server
Error: (315.1) Cannot connect to service "blast4"
Error: (315.2) CConn_Streambuf::CConn_Streambuf(): NULL connector (UNKNOWN): Unknown

...etc.

These don't break the build but they should really be disabled.  Is
there an easy way to do this, do you think?  I've had a poke around in
the Makefiles but it's fairly cryptic.

Anyway, the complex issue:

A user reported that his analysis took an order of magnitude longer
after upgrading BLAST+ (from the static binary build to the Debian Med
build).  I'd expect some slowdown with dynamic linking but this is
indeed fairly drastic:

Static (downloaded from NCBI):
tbooth@barsukas[latest]time bash -c 'for (( c=1; c<=50; c++ )) ; do ~/tings/ncbi-blast-2.2.25+/bin/blastx -h > /dev/null ; done'
0.76user 0.29system 0:00.94elapsed 110%CPU (0avgtext+0avgdata 39728maxresident)k
32inputs+0outputs (2major+133193minor)pagefaults 0swaps

Dynamic (built with debuild):
tbooth@barsukas[latest]time bash -c 'for (( c=1; c<=50; c++ )) ; do c++/BUILD/bin/blastx -h > /dev/null ; done' 
3.91user 8.91system 0:13.00elapsed 98%CPU (0avgtext+0avgdata 827376maxresident)k
0inputs+0outputs (0major+2623550minor)pagefaults 0swaps

So assuming that printing the help message is trivial, and essentially a
no-op, the Debian build is taking more than a quarter of a second to
fire up.  For scripts that call BLAST in a tight loop on small sequences
this is a drastic slowdown - nearly all the analysis time is actually
used up just starting BLAST.

For comparison, I tried timing Perl:

tbooth@barsukas[latest]time bash -c 'for (( c=1; c<=50; c++ )) ; do perl -h > /dev/null ; done'
0.36user 0.17system 0:00.21elapsed 244%CPU (0avgtext+0avgdata 6064maxresident)k
0inputs+0outputs (0major+27317minor)pagefaults 0swaps

I know Perl is well optimised, but this is still a massive disparity.

I wondered if there was a way to speed up linking, so I had a play with
'prelink', but I realised this just helps starting the program the first
time in the loop.  After that the linking data is all cached anyway.
Then I tried mashing all the .so files created by the build into one
"libncbiblast_all.so" and linking to this.  It compiled and ran but made
no difference whatsoever to startup time of blastx.

So, maybe I'm barking up the wrong tree and something other than the
linking is causing the delay, or maybe there is just no way to get the
speedup other than statically linking the binaries.  (I know I can't be
the first person to try all this but I can't find any previous
discussion/documentation).

If the latter, I know the real fix is for script authors use BLAST more
sensibly, but I'm wondering if there is any mileage in trying to make a
ncbi-blast+-static package?  This would build from the same source, and
replace (dpkg-divert) the main binaries with static versions to give a
quick-fix speedup at the cost of a big hunk of disk space.  I've not
actually tried making this yet, but what do you think?

Cheers,

TIM

-- 
To Err is human.
To Arrr is Pirate!



Reply to: