[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [outreachy] autopkgtest-pkg-perl in librg-utils-perl does nothing



Hi Tanya,

On Wed, Jul 13, 2016 at 07:24:12PM +0300, merlettaia wrote:
> 
> I found a problem in which this package is involved also.
> Last weekend I started to work on predictprotein. The hardest problem was
> to make it work.
> https://wiki.debian.org/DebianMed/PredictProtein - at some point I found
> this instruction, spent some time downloading database, and when I
> downloaded and installed it, then run predictprotein, I've got multilple
> error messages (output_with_errors.txt). It turned out that when one of the
> perl scripts in librg-utils-perl calls blastpgp on that database,
>   blastpgp -F F -a 1 -j 3 -b 3000 -e 1 -h 1e-3 -d
> /data/src/rostlab-data/data/big/big_80 -i query.fasta -o
> query.blastPsiOutTmp -C query.chk -Q query.blastPsiMat
> 
> - blastpgp ends up with "Killed" message, and produces incorrect output
> file (query.blastPsiOutTmp is incomplete). Script in librg-utils-perl is
> correct, call in predictprotein is correct. Blastpgp fails with error.
> 
> I thought that incorrect database format could be the reason for it.
> Because version of ncbi-blast+ (blastpgp belongs to this package) package
> uses latest version of that database, and database from RostLab's website
> probably isn't latest.
> I downloaded from NCBI FTP (ftp://ftp.ncbi.nlm.nih.gov/blast/db/) one of
> the databases, and tried to run predictprotein with that data. It worked!
> But now I've got error while metastudent run (output in some_output.txt) -
> I'm working to fix it now.

Thanks for your very thorough investigation.  I have put Laszlo in CC -
may be he has some contact information or can help himself even if he
is not active in Debian Med any more.
 
> And there are two things I don't understand:
> 
> Is there any package which contains copy of current version of blastp
> database? Or small part of it. It seems that autopkgtest testsuite should
> use smaller portion of blastp database.

As far as I know there is no such package.  IMHO it might be a good idea
to ship something like a stripped down database since it could be used
as test data input for several other packages.  What do other think?

> For now it seems unclear how to test predictprotein with autopkgtest, since
> for correct run it requires also local copy of (possibly) huge database
> (~30GB in copy from RostLab's website), probably ncbi-blast+/ncbi-tools6
> should download and install it?

For manual user tests this might be OK, but autopkgtest should be
offline.

> Predictprotein has special parameters for
> different databases, and path to blast installation can be provided by
> hand, that makes possible to call it with smaller database in testsuite
> run.

Sounds convincing.

> But that will work only if blastpgp from ncbi-blast+ works correctly
> with the same version of database. That means that better way to
> install+test database usage from ncbi-blast+ tests, and use default
> database installed with ncbi-blast+ (if it will be installed).
> 
> Could you also check that database from here:
> https://wiki.debian.org/DebianMed/PredictProtein - really doesn't work? I
> have unstable internet connection and not sure if that file was not
> corrupted.

Any volunteer for this?  My internet is currently also not the best.

Kind regards

       Andreas. 


> cache merging is off at /usr/bin/predictprotein line 230.
> work_dir=/data/src/temp at /usr/bin/predictprotein line 336.
> make --no-builtin-rules INFILE=query.in -C /data/src/temp JOBID=query -j 1 BLASTCORES=1 LIBRGUTILS=/usr/share/librg-utils-perl/ PPROOT=/usr/share/predictprotein/ PROFNUMRESMIN=17 PROFROOT=/usr/share/profphd/prof/ BIGBLASTDB=/data/src/rostlab-data/data/aa/pdbaa BIG80BLASTDB=/data/src/rostlab-data/data/aa/pdbaa PFAM2DB=/data/src/rostlab-data/data/pfam_legacy/Pfam_ls PFAM3DB=/data/src/rostlab-data/data/pfam/Pfam-A.hmm PROSITEDAT=/data/src/rostlab-data/data/prosite/prosite.dat PROSITECONVDAT=/data/src/rostlab-data/data/prosite/prosite_convert.dat PSICEXE=/usr/share/rost-runpsic/runNewPSIC.pl SPKEYIDX=/data/src/rostlab-data/data/swissprot/keyindex_loctree.txt SWISSBLASTDB=/data/src/rostlab-data/data/swissprot/uniprot_sprot NORSPCTRL="--win=100" DEBUG=1 -f /usr/share/predictprotein/MakefilePP.mk all norsp at /usr/bin/predictprotein line 383.
> make: Entering directory '/data/src/temp'
> metastudent -i query.fasta -o query.metastudent --silent  --debug
> mkdir -p /tmp/metastudentulQjHj/methodC;cd /usr/lib/python2.7/dist-packages/metastudentPkg/lib/groupC;./CafaWrapper3.pl /tmp/metastudentulQjHj/query.fasta_eval1.0_iters3_srcgoasp.mfo.blast /tmp/metastudentulQjHj/methodC/output.MFO.txt 0 /tmp/metastudentulQjHj/methodC
> !!!Error!!! mkdir -p /tmp/metastudentulQjHj/methodC;cd /usr/lib/python2.7/dist-packages/metastudentPkg/lib/groupC;./CafaWrapper3.pl /tmp/metastudentulQjHj/query.fasta_eval1.0_iters3_srcgoasp.mfo.blast /tmp/metastudentulQjHj/methodC/output.MFO.txt 0 /tmp/metastudentulQjHj/methodC
> 65280
> Can't use a hash as a reference at /usr/share/perl5/GO/IO/Dotty.pm line 104.
> Compilation failed in require at ./treehandler.pl line 10.
> BEGIN failed--compilation aborted at ./treehandler.pl line 10.
> ./treehandler.pl -mfo transitiveClosure2014.txt -bpo transitiveClosure2014.txt -cco transitiveClosure2014.txt -method 3 -pred /tmp/metastudentulQjHj/methodC/blast.out -scoring 0 failed: 255 at ./CafaWrapper3.pl line 16.
> Error occurred: IOError
> Traceback (most recent call last):
>   File "/usr/bin/metastudent", line 721, in <module>
>     runIt(tempfile, inputFastaFilePath, outputFilePath, outputBlast, blastKickstartDatabasePaths, ontologies, blastOnly, keepTemp, allPreds, debug, noNames, withImages)
>   File "/usr/bin/metastudent", line 187, in runIt
>     predLinesDict["C"] = runMethodC(blastKickstartDatabasePath, fastaFilePathLocal, tmpDirPath, configMap["GROUP_C_SCORING_%s" % (ontology) ], ontology, configMap, debug)
>   File "/usr/lib/python2.7/dist-packages/metastudentPkg/runMethods.py", line 206, in runMethodC
>     with open(outputFilePath) as f: 						
> IOError: [Errno 2] No such file or directory: '/tmp/metastudentulQjHj/methodC/output.MFO.txt'
> /usr/share/predictprotein/MakefilePP.mk:403: recipe for target 'query.metastudent.BPO.txt' failed
> make: *** [query.metastudent.BPO.txt] Error 1
> make: Leaving directory '/data/src/temp'
> make --no-builtin-rules INFILE=query.in -C /data/src/temp JOBID=query -j 1 BLASTCORES=1 LIBRGUTILS=/usr/share/librg-utils-perl/ PPROOT=/usr/share/predictprotein/ PROFNUMRESMIN=17 PROFROOT=/usr/share/profphd/prof/ BIGBLASTDB=/data/src/rostlab-data/data/aa/pdbaa BIG80BLASTDB=/data/src/rostlab-data/data/aa/pdbaa PFAM2DB=/data/src/rostlab-data/data/pfam_legacy/Pfam_ls PFAM3DB=/data/src/rostlab-data/data/pfam/Pfam-A.hmm PROSITEDAT=/data/src/rostlab-data/data/prosite/prosite.dat PROSITECONVDAT=/data/src/rostlab-data/data/prosite/prosite_convert.dat PSICEXE=/usr/share/rost-runpsic/runNewPSIC.pl SPKEYIDX=/data/src/rostlab-data/data/swissprot/keyindex_loctree.txt SWISSBLASTDB=/data/src/rostlab-data/data/swissprot/uniprot_sprot NORSPCTRL="--win=100" DEBUG=1 -f /usr/share/predictprotein/MakefilePP.mk all norsp failed: 512 at /usr/bin/predictprotein line 392.

> cache merging is off at /usr/bin/predictprotein line 230.
> work_dir=/data/src/temp at /usr/bin/predictprotein line 336.
> make --no-builtin-rules INFILE=query.in -C /data/src/temp JOBID=query -j 1 BLASTCORES=1 LIBRGUTILS=/usr/share/librg-utils-perl/ PPROOT=/usr/share/predictprotein/ PROFNUMRESMIN=17 PROFROOT=/usr/share/profphd/prof/ BIGBLASTDB=/data/src/rostlab-data/data/big/big BIG80BLASTDB=/data/src/rostlab-data/data/big/big_80 PFAM2DB=/data/src/rostlab-data/data/pfam_legacy/Pfam_ls PFAM3DB=/data/src/rostlab-data/data/pfam/Pfam-A.hmm PROSITEDAT=/data/src/rostlab-data/data/prosite/prosite.dat PROSITECONVDAT=/data/src/rostlab-data/data/prosite/prosite_convert.dat PSICEXE=/usr/share/rost-runpsic/runNewPSIC.pl SPKEYIDX=/data/src/rostlab-data/data/swissprot/keyindex_loctree.txt SWISSBLASTDB=/data/src/rostlab-data/data/swissprot/uniprot_sprot NORSPCTRL="--win=100" DEBUG=1 -f /usr/share/predictprotein/MakefilePP.mk all norsp at /usr/bin/predictprotein line 383.
> make: Entering directory '/data/src/temp'
> make: Warning: File 'query.in' has modification time 3.2 s in the future
> /usr/share/librg-utils-perl//copf.pl query.in formatIn=fasta formatOut=fasta fileOut=query.fasta exeConvertSeq=convert_seq
> /usr/share/librg-utils-perl//copf.pl query.in formatIn=fasta formatOut=gcg fileOut=query.seqGCG exeConvertSeq=convert_seq
> ncbi-seg query.fasta -x > query.segNorm
> /usr/share/librg-utils-perl//copf.pl query.segNorm formatOut=gcg fileOut=query.segNormGCG
> # blast call may throw warnings on STDERR - silence it when we are not in debug mode; blastpgp and blastall create a normally 0-sized 'error.log' - remove it
> trap "rm -f error.log" EXIT; \
> if ! ( blastpgp -F F -a 1 -j 3 -b 3000 -e 1 -h 1e-3 -d /data/src/rostlab-data/data/big/big_80 -i query.fasta -o query.blastPsiOutTmp -C query.chk -Q query.blastPsiMat   ); then \
> 	EXIT=$?; cat error.log >&2; exit $EXIT; \
> fi
> Killed
> cat: error.log: No such file or directory
> # blast call may throw warnings on STDERR - silence it when we are not in debug mode
> trap "rm -f error.log" EXIT; \
> if ! ( blastpgp -F F -a 1 -b 1000 -e 1 -d /data/src/rostlab-data/data/big/big -i query.fasta -o query.blastPsiAli.nz -R query.chk   ); then \
> 	EXIT=$?; cat error.log >&2; exit $EXIT; \
> fi
> [blastpgp] WARNING: -t larger than 1 not supported when restarting from a checkpoint; setting -t to 1
> 
> [blastpgp] WARNING: posReadCheckpoint: Attempting to recover data from previous checkpoint
> 
> [blastpgp] WARNING: posReadPosFreqsStandard: Could not open checkpoint file
> 
> [blastpgp] WARNING: posReadCheckpoint: Data recovery failed
> 
> [blastpgp] FATAL ERROR: blast: Error recovering from checkpoint
> cat: error.log: No such file or directory
> gzip -c -6 < 'query.blastPsiAli.nz' > 'query.blastPsiAli.gz'
> # lkajan: we have to switch off filtering (default for blastpgp) or sequences like ASDSADADASDASDASDSADASA fail with
> # 'WARNING: query: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options'
> # Does switching off filtering hurt us? Loctree uses the results of this for extracting keywords from swissprot, so I am not worried.
> # This blast call also often writes 'Selenocysteine (U) at position 59 replaced by X' - we are not really interested. Silence this in non-debug mode.
> trap "rm -f error.log" EXIT; \
> if ! ( blastall -F F -a 1 -p blastp -d /data/src/rostlab-data/data/swissprot/uniprot_sprot -b 1000 -e 100 -m 8 -i query.fasta -o query.blastpSwissM8   ); then \
> 	EXIT=$?; cat error.log >&2; exit $EXIT; \
> fi
> /usr/share/librg-utils-perl//blastpgp_to_saf.pl fileInBlast=query.blastPsiOutTmp fileInQuery=query.fasta  fileOutRdb=query.blastPsi80Rdb fileOutSaf=query.safBlastPsi80 red=100 maxAli=3000 tile=0
> opened query.fasta at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 126.
> blastfile: query.blastPsiOutTmp at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 127.
> nohits: 0 at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 128.
> iter: 0 at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 129.
> blast+: 0 at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 130.
> Died at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 76.
> *** ERROR blastpgp_to_saf.pl : *** ERROR blastp_to_saf: blast file format not recognized
> /usr/share/predictprotein/MakefilePP.mk:465: recipe for target 'query.safBlastPsi80' failed
> make: *** [query.safBlastPsi80] Error 255
> rm query.blastPsi80Rdb query.blastPsiAli.nz
> make: Leaving directory '/data/src/temp'
> make --no-builtin-rules INFILE=query.in -C /data/src/temp JOBID=query -j 1 BLASTCORES=1 LIBRGUTILS=/usr/share/librg-utils-perl/ PPROOT=/usr/share/predictprotein/ PROFNUMRESMIN=17 PROFROOT=/usr/share/profphd/prof/ BIGBLASTDB=/data/src/rostlab-data/data/big/big BIG80BLASTDB=/data/src/rostlab-data/data/big/big_80 PFAM2DB=/data/src/rostlab-data/data/pfam_legacy/Pfam_ls PFAM3DB=/data/src/rostlab-data/data/pfam/Pfam-A.hmm PROSITEDAT=/data/src/rostlab-data/data/prosite/prosite.dat PROSITECONVDAT=/data/src/rostlab-data/data/prosite/prosite_convert.dat PSICEXE=/usr/share/rost-runpsic/runNewPSIC.pl SPKEYIDX=/data/src/rostlab-data/data/swissprot/keyindex_loctree.txt SWISSBLASTDB=/data/src/rostlab-data/data/swissprot/uniprot_sprot NORSPCTRL="--win=100" DEBUG=1 -f /usr/share/predictprotein/MakefilePP.mk all norsp failed: 512 at /usr/bin/predictprotein line 392.


-- 
http://fam-tille.de


Reply to: