[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: BLAST+ speed & build issues



Hi,

> Creating a "static" package would be easy, but would double the size
> (which is large) for blast+ on servers, 

Actually, it makes it about 8x the size, as each binary has its own
copies of every library.

> and it could be confusing for
> the user, no? (selecting between dynamic and static).

Yes, but I already have a confused user asking why his analysis takes
six times as long when he recently upgraded BLAST.  How would you
suggest tackling this problem?

Actually, I realise I'm jumping the gun a bit here.  I've not yet tried
compiling my own static version from source in order to do a speed
comparison with the binary version distributed by NCBI.  I'll try that
today.

Cheers,

TIM

> 
> Olivier
> 
> 
> Le 8/2/11 6:53 PM, Tim Booth a écrit :
> > Hi Olivier and Aaron,
> >
> > I've been playing around with BLAST+, trying to tackle one fairly simple
> > but unimportant issue and one more complex and problematic issue.
> >
> > The easy one first:
> >
> > In the build log (on Launchpad) I see that during the test phase of the
> > build there are various attempts to connect to the NCBI servers -
> > starting at:
> >
> > ======================================================================
> > blast_services_unit_test
> > ======================================================================
> >
> > Running 23 test cases...
> > Error: (311.22) SOCK#1000[?]: [SOCK::Connect]  Failed SOCK_gethostbyname("www.ncbi.nlm.nih.gov")
> > Error: (303.7) [URL_Connect]  Socket connect to www.ncbi.nlm.nih.gov:80 failed: Unknown
> > Error: (311.22) SOCK#2000[?]: [SOCK::Connect]  Failed SOCK_gethostbyname("www.ncbi.nlm.nih.gov")
> > Error: (303.7) [URL_Connect]  Socket connect to www.ncbi.nlm.nih.gov:80 failed: Unknown
> > Error: (311.22) SOCK#3000[?]: [SOCK::Connect]  Failed SOCK_gethostbyname("www.ncbi.nlm.nih.gov")
> > Error: (303.7) [URL_Connect]  Socket connect to www.ncbi.nlm.nih.gov:80 failed: Unknown
> > Error: (310.5) [blast4]  Cannot locate server
> > Error: (315.1) Cannot connect to service "blast4"
> > Error: (315.2) CConn_Streambuf::CConn_Streambuf(): NULL connector (UNKNOWN): Unknown
> >
> > ...etc.
> >
> > These don't break the build but they should really be disabled.  Is
> > there an easy way to do this, do you think?  I've had a poke around in
> > the Makefiles but it's fairly cryptic.
> >
> > Anyway, the complex issue:
> >
> > A user reported that his analysis took an order of magnitude longer
> > after upgrading BLAST+ (from the static binary build to the Debian Med
> > build).  I'd expect some slowdown with dynamic linking but this is
> > indeed fairly drastic:
> >
> > Static (downloaded from NCBI):
> > tbooth@barsukas[latest]time bash -c 'for (( c=1; c<=50; c++ )) ; do ~/tings/ncbi-blast-2.2.25+/bin/blastx -h > /dev/null ; done'
> > 0.76user 0.29system 0:00.94elapsed 110%CPU (0avgtext+0avgdata 39728maxresident)k
> > 32inputs+0outputs (2major+133193minor)pagefaults 0swaps
> >
> > Dynamic (built with debuild):
> > tbooth@barsukas[latest]time bash -c 'for (( c=1; c<=50; c++ )) ; do c++/BUILD/bin/blastx -h > /dev/null ; done' 
> > 3.91user 8.91system 0:13.00elapsed 98%CPU (0avgtext+0avgdata 827376maxresident)k
> > 0inputs+0outputs (0major+2623550minor)pagefaults 0swaps
> >
> > So assuming that printing the help message is trivial, and essentially a
> > no-op, the Debian build is taking more than a quarter of a second to
> > fire up.  For scripts that call BLAST in a tight loop on small sequences
> > this is a drastic slowdown - nearly all the analysis time is actually
> > used up just starting BLAST.
> >
> > For comparison, I tried timing Perl:
> >
> > tbooth@barsukas[latest]time bash -c 'for (( c=1; c<=50; c++ )) ; do perl -h > /dev/null ; done'
> > 0.36user 0.17system 0:00.21elapsed 244%CPU (0avgtext+0avgdata 6064maxresident)k
> > 0inputs+0outputs (0major+27317minor)pagefaults 0swaps
> >
> > I know Perl is well optimised, but this is still a massive disparity.
> >
> > I wondered if there was a way to speed up linking, so I had a play with
> > 'prelink', but I realised this just helps starting the program the first
> > time in the loop.  After that the linking data is all cached anyway.
> > Then I tried mashing all the .so files created by the build into one
> > "libncbiblast_all.so" and linking to this.  It compiled and ran but made
> > no difference whatsoever to startup time of blastx.
> >
> > So, maybe I'm barking up the wrong tree and something other than the
> > linking is causing the delay, or maybe there is just no way to get the
> > speedup other than statically linking the binaries.  (I know I can't be
> > the first person to try all this but I can't find any previous
> > discussion/documentation).
> >
> > If the latter, I know the real fix is for script authors use BLAST more
> > sensibly, but I'm wondering if there is any mileage in trying to make a
> > ncbi-blast+-static package?  This would build from the same source, and
> > replace (dpkg-divert) the main binaries with static versions to give a
> > quick-fix speedup at the cost of a big hunk of disk space.  I've not
> > actually tried making this yet, but what do you think?
> >
> > Cheers,
> >
> > TIM
> >
> 
> -- 
> Olivier Sallou
> IRISA / University of Rennes 1
> Campus de Beaulieu, 35000 RENNES - FRANCE
> Tel: 02.99.84.71.95
> 
> gpg key id: 4096R/326D8438  (pgp.mit.edu)
> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
> 
> 
> 

-- 
To Err is human.
To Arrr is Pirate!


Reply to: