[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: BLAST+ speed & build issues



Hi Tim,
regarding speed, I don't know the reason of the difference between
static and dynamic linking here.
When I created the package I 've chosen the dynamic solution to reduce
the size of the binary, while making the library available for other
software linkage.
Creating a "static" package would be easy, but would double the size
(which is large) for blast+ on servers, and it could be confusing for
the user, no? (selecting between dynamic and static).


Olivier


Le 8/2/11 6:53 PM, Tim Booth a écrit :
> Hi Olivier and Aaron,
>
> I've been playing around with BLAST+, trying to tackle one fairly simple
> but unimportant issue and one more complex and problematic issue.
>
> The easy one first:
>
> In the build log (on Launchpad) I see that during the test phase of the
> build there are various attempts to connect to the NCBI servers -
> starting at:
>
> ======================================================================
> blast_services_unit_test
> ======================================================================
>
> Running 23 test cases...
> Error: (311.22) SOCK#1000[?]: [SOCK::Connect]  Failed SOCK_gethostbyname("www.ncbi.nlm.nih.gov")
> Error: (303.7) [URL_Connect]  Socket connect to www.ncbi.nlm.nih.gov:80 failed: Unknown
> Error: (311.22) SOCK#2000[?]: [SOCK::Connect]  Failed SOCK_gethostbyname("www.ncbi.nlm.nih.gov")
> Error: (303.7) [URL_Connect]  Socket connect to www.ncbi.nlm.nih.gov:80 failed: Unknown
> Error: (311.22) SOCK#3000[?]: [SOCK::Connect]  Failed SOCK_gethostbyname("www.ncbi.nlm.nih.gov")
> Error: (303.7) [URL_Connect]  Socket connect to www.ncbi.nlm.nih.gov:80 failed: Unknown
> Error: (310.5) [blast4]  Cannot locate server
> Error: (315.1) Cannot connect to service "blast4"
> Error: (315.2) CConn_Streambuf::CConn_Streambuf(): NULL connector (UNKNOWN): Unknown
>
> ...etc.
>
> These don't break the build but they should really be disabled.  Is
> there an easy way to do this, do you think?  I've had a poke around in
> the Makefiles but it's fairly cryptic.
>
> Anyway, the complex issue:
>
> A user reported that his analysis took an order of magnitude longer
> after upgrading BLAST+ (from the static binary build to the Debian Med
> build).  I'd expect some slowdown with dynamic linking but this is
> indeed fairly drastic:
>
> Static (downloaded from NCBI):
> tbooth@barsukas[latest]time bash -c 'for (( c=1; c<=50; c++ )) ; do ~/tings/ncbi-blast-2.2.25+/bin/blastx -h > /dev/null ; done'
> 0.76user 0.29system 0:00.94elapsed 110%CPU (0avgtext+0avgdata 39728maxresident)k
> 32inputs+0outputs (2major+133193minor)pagefaults 0swaps
>
> Dynamic (built with debuild):
> tbooth@barsukas[latest]time bash -c 'for (( c=1; c<=50; c++ )) ; do c++/BUILD/bin/blastx -h > /dev/null ; done' 
> 3.91user 8.91system 0:13.00elapsed 98%CPU (0avgtext+0avgdata 827376maxresident)k
> 0inputs+0outputs (0major+2623550minor)pagefaults 0swaps
>
> So assuming that printing the help message is trivial, and essentially a
> no-op, the Debian build is taking more than a quarter of a second to
> fire up.  For scripts that call BLAST in a tight loop on small sequences
> this is a drastic slowdown - nearly all the analysis time is actually
> used up just starting BLAST.
>
> For comparison, I tried timing Perl:
>
> tbooth@barsukas[latest]time bash -c 'for (( c=1; c<=50; c++ )) ; do perl -h > /dev/null ; done'
> 0.36user 0.17system 0:00.21elapsed 244%CPU (0avgtext+0avgdata 6064maxresident)k
> 0inputs+0outputs (0major+27317minor)pagefaults 0swaps
>
> I know Perl is well optimised, but this is still a massive disparity.
>
> I wondered if there was a way to speed up linking, so I had a play with
> 'prelink', but I realised this just helps starting the program the first
> time in the loop.  After that the linking data is all cached anyway.
> Then I tried mashing all the .so files created by the build into one
> "libncbiblast_all.so" and linking to this.  It compiled and ran but made
> no difference whatsoever to startup time of blastx.
>
> So, maybe I'm barking up the wrong tree and something other than the
> linking is causing the delay, or maybe there is just no way to get the
> speedup other than statically linking the binaries.  (I know I can't be
> the first person to try all this but I can't find any previous
> discussion/documentation).
>
> If the latter, I know the real fix is for script authors use BLAST more
> sensibly, but I'm wondering if there is any mileage in trying to make a
> ncbi-blast+-static package?  This would build from the same source, and
> replace (dpkg-divert) the main binaries with static versions to give a
> quick-fix speedup at the cost of a big hunk of disk space.  I've not
> actually tried making this yet, but what do you think?
>
> Cheers,
>
> TIM
>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (pgp.mit.edu)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438



Reply to: