[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [outreachy] autopkgtest-pkg-perl in librg-utils-perl does nothing



Hi Andreas,
It would be fine to drop this line then.

I found a problem in which this package is involved also.
Last weekend I started to work on predictprotein. The hardest problem was to make it work.
https://wiki.debian.org/DebianMed/PredictProtein - at some point I found this instruction, spent some time downloading database, and when I downloaded and installed it, then run predictprotein, I've got multilple error messages (output_with_errors.txt). It turned out that when one of the perl scripts in librg-utils-perl calls blastpgp on that database,
  blastpgp -F F -a 1 -j 3 -b 3000 -e 1 -h 1e-3 -d /data/src/rostlab-data/data/big/big_80 -i query.fasta -o query.blastPsiOutTmp -C query.chk -Q query.blastPsiMat
 
- blastpgp ends up with "Killed" message, and produces incorrect output file (query.blastPsiOutTmp is incomplete). Script in librg-utils-perl is correct, call in predictprotein is correct. Blastpgp fails with error.

I thought that incorrect database format could be the reason for it. Because version of ncbi-blast+ (blastpgp belongs to this package) package uses latest version of that database, and database from RostLab's website probably isn't latest.
I downloaded from NCBI FTP (ftp://ftp.ncbi.nlm.nih.gov/blast/db/) one of the databases, and tried to run predictprotein with that data. It worked! But now I've got error while metastudent run (output in some_output.txt) - I'm working to fix it now.

And there are two things I don't understand:

Is there any package which contains copy of current version of blastp database? Or small part of it. It seems that autopkgtest testsuite should use smaller portion of blastp database.
For now it seems unclear how to test predictprotein with autopkgtest, since for correct run it requires also local copy of (possibly) huge database (~30GB in copy from RostLab's website), probably ncbi-blast+/ncbi-tools6 should download and install it? Predictprotein has special parameters for different databases, and path to blast installation can be provided by hand, that makes possible to call it with smaller database in testsuite run. But that will work only if blastpgp from ncbi-blast+ works correctly with the same version of database. That means that better way to install+test database usage from ncbi-blast+ tests, and use default database installed with ncbi-blast+ (if it will be installed).

Could you also check that database from here: https://wiki.debian.org/DebianMed/PredictProtein - really doesn't work? I have unstable internet connection and not sure if that file was not corrupted.




2016-07-13 11:08 GMT+03:00 Andreas Tille <andreas@an3as.eu>:
Hi Tanya,

when I did some look over older commits I wonder what
autopkgtest-pkg-perl in librg-utils-perl is actually doing.
The build log does not show anything but

   Nothing to be done for 'check'.

lines.  While I think that your change to fix error "Can't use
'defined(@array)' is well worth an upload I wonder whether the test is
doing nothing and the line should rather be dropped from debian/control.

What do you think?

Kind regards

     Andreas.

--
http://fam-tille.de




--
Best wishes,
Tanya.
cache merging is off at /usr/bin/predictprotein line 230.
work_dir=/data/src/temp at /usr/bin/predictprotein line 336.
make --no-builtin-rules INFILE=query.in -C /data/src/temp JOBID=query -j 1 BLASTCORES=1 LIBRGUTILS=/usr/share/librg-utils-perl/ PPROOT=/usr/share/predictprotein/ PROFNUMRESMIN=17 PROFROOT=/usr/share/profphd/prof/ BIGBLASTDB=/data/src/rostlab-data/data/aa/pdbaa BIG80BLASTDB=/data/src/rostlab-data/data/aa/pdbaa PFAM2DB=/data/src/rostlab-data/data/pfam_legacy/Pfam_ls PFAM3DB=/data/src/rostlab-data/data/pfam/Pfam-A.hmm PROSITEDAT=/data/src/rostlab-data/data/prosite/prosite.dat PROSITECONVDAT=/data/src/rostlab-data/data/prosite/prosite_convert.dat PSICEXE=/usr/share/rost-runpsic/runNewPSIC.pl SPKEYIDX=/data/src/rostlab-data/data/swissprot/keyindex_loctree.txt SWISSBLASTDB=/data/src/rostlab-data/data/swissprot/uniprot_sprot NORSPCTRL="--win=100" DEBUG=1 -f /usr/share/predictprotein/MakefilePP.mk all norsp at /usr/bin/predictprotein line 383.
make: Entering directory '/data/src/temp'
metastudent -i query.fasta -o query.metastudent --silent  --debug
mkdir -p /tmp/metastudentulQjHj/methodC;cd /usr/lib/python2.7/dist-packages/metastudentPkg/lib/groupC;./CafaWrapper3.pl /tmp/metastudentulQjHj/query.fasta_eval1.0_iters3_srcgoasp.mfo.blast /tmp/metastudentulQjHj/methodC/output.MFO.txt 0 /tmp/metastudentulQjHj/methodC
!!!Error!!! mkdir -p /tmp/metastudentulQjHj/methodC;cd /usr/lib/python2.7/dist-packages/metastudentPkg/lib/groupC;./CafaWrapper3.pl /tmp/metastudentulQjHj/query.fasta_eval1.0_iters3_srcgoasp.mfo.blast /tmp/metastudentulQjHj/methodC/output.MFO.txt 0 /tmp/metastudentulQjHj/methodC
65280
Can't use a hash as a reference at /usr/share/perl5/GO/IO/Dotty.pm line 104.
Compilation failed in require at ./treehandler.pl line 10.
BEGIN failed--compilation aborted at ./treehandler.pl line 10.
./treehandler.pl -mfo transitiveClosure2014.txt -bpo transitiveClosure2014.txt -cco transitiveClosure2014.txt -method 3 -pred /tmp/metastudentulQjHj/methodC/blast.out -scoring 0 failed: 255 at ./CafaWrapper3.pl line 16.
Error occurred: IOError
Traceback (most recent call last):
  File "/usr/bin/metastudent", line 721, in <module>
    runIt(tempfile, inputFastaFilePath, outputFilePath, outputBlast, blastKickstartDatabasePaths, ontologies, blastOnly, keepTemp, allPreds, debug, noNames, withImages)
  File "/usr/bin/metastudent", line 187, in runIt
    predLinesDict["C"] = runMethodC(blastKickstartDatabasePath, fastaFilePathLocal, tmpDirPath, configMap["GROUP_C_SCORING_%s" % (ontology) ], ontology, configMap, debug)
  File "/usr/lib/python2.7/dist-packages/metastudentPkg/runMethods.py", line 206, in runMethodC
    with open(outputFilePath) as f: 						
IOError: [Errno 2] No such file or directory: '/tmp/metastudentulQjHj/methodC/output.MFO.txt'
/usr/share/predictprotein/MakefilePP.mk:403: recipe for target 'query.metastudent.BPO.txt' failed
make: *** [query.metastudent.BPO.txt] Error 1
make: Leaving directory '/data/src/temp'
make --no-builtin-rules INFILE=query.in -C /data/src/temp JOBID=query -j 1 BLASTCORES=1 LIBRGUTILS=/usr/share/librg-utils-perl/ PPROOT=/usr/share/predictprotein/ PROFNUMRESMIN=17 PROFROOT=/usr/share/profphd/prof/ BIGBLASTDB=/data/src/rostlab-data/data/aa/pdbaa BIG80BLASTDB=/data/src/rostlab-data/data/aa/pdbaa PFAM2DB=/data/src/rostlab-data/data/pfam_legacy/Pfam_ls PFAM3DB=/data/src/rostlab-data/data/pfam/Pfam-A.hmm PROSITEDAT=/data/src/rostlab-data/data/prosite/prosite.dat PROSITECONVDAT=/data/src/rostlab-data/data/prosite/prosite_convert.dat PSICEXE=/usr/share/rost-runpsic/runNewPSIC.pl SPKEYIDX=/data/src/rostlab-data/data/swissprot/keyindex_loctree.txt SWISSBLASTDB=/data/src/rostlab-data/data/swissprot/uniprot_sprot NORSPCTRL="--win=100" DEBUG=1 -f /usr/share/predictprotein/MakefilePP.mk all norsp failed: 512 at /usr/bin/predictprotein line 392.
cache merging is off at /usr/bin/predictprotein line 230.
work_dir=/data/src/temp at /usr/bin/predictprotein line 336.
make --no-builtin-rules INFILE=query.in -C /data/src/temp JOBID=query -j 1 BLASTCORES=1 LIBRGUTILS=/usr/share/librg-utils-perl/ PPROOT=/usr/share/predictprotein/ PROFNUMRESMIN=17 PROFROOT=/usr/share/profphd/prof/ BIGBLASTDB=/data/src/rostlab-data/data/big/big BIG80BLASTDB=/data/src/rostlab-data/data/big/big_80 PFAM2DB=/data/src/rostlab-data/data/pfam_legacy/Pfam_ls PFAM3DB=/data/src/rostlab-data/data/pfam/Pfam-A.hmm PROSITEDAT=/data/src/rostlab-data/data/prosite/prosite.dat PROSITECONVDAT=/data/src/rostlab-data/data/prosite/prosite_convert.dat PSICEXE=/usr/share/rost-runpsic/runNewPSIC.pl SPKEYIDX=/data/src/rostlab-data/data/swissprot/keyindex_loctree.txt SWISSBLASTDB=/data/src/rostlab-data/data/swissprot/uniprot_sprot NORSPCTRL="--win=100" DEBUG=1 -f /usr/share/predictprotein/MakefilePP.mk all norsp at /usr/bin/predictprotein line 383.
make: Entering directory '/data/src/temp'
make: Warning: File 'query.in' has modification time 3.2 s in the future
/usr/share/librg-utils-perl//copf.pl query.in formatIn=fasta formatOut=fasta fileOut=query.fasta exeConvertSeq=convert_seq
/usr/share/librg-utils-perl//copf.pl query.in formatIn=fasta formatOut=gcg fileOut=query.seqGCG exeConvertSeq=convert_seq
ncbi-seg query.fasta -x > query.segNorm
/usr/share/librg-utils-perl//copf.pl query.segNorm formatOut=gcg fileOut=query.segNormGCG
# blast call may throw warnings on STDERR - silence it when we are not in debug mode; blastpgp and blastall create a normally 0-sized 'error.log' - remove it
trap "rm -f error.log" EXIT; \
if ! ( blastpgp -F F -a 1 -j 3 -b 3000 -e 1 -h 1e-3 -d /data/src/rostlab-data/data/big/big_80 -i query.fasta -o query.blastPsiOutTmp -C query.chk -Q query.blastPsiMat   ); then \
	EXIT=$?; cat error.log >&2; exit $EXIT; \
fi
Killed
cat: error.log: No such file or directory
# blast call may throw warnings on STDERR - silence it when we are not in debug mode
trap "rm -f error.log" EXIT; \
if ! ( blastpgp -F F -a 1 -b 1000 -e 1 -d /data/src/rostlab-data/data/big/big -i query.fasta -o query.blastPsiAli.nz -R query.chk   ); then \
	EXIT=$?; cat error.log >&2; exit $EXIT; \
fi
[blastpgp] WARNING: -t larger than 1 not supported when restarting from a checkpoint; setting -t to 1

[blastpgp] WARNING: posReadCheckpoint: Attempting to recover data from previous checkpoint

[blastpgp] WARNING: posReadPosFreqsStandard: Could not open checkpoint file

[blastpgp] WARNING: posReadCheckpoint: Data recovery failed

[blastpgp] FATAL ERROR: blast: Error recovering from checkpoint
cat: error.log: No such file or directory
gzip -c -6 < 'query.blastPsiAli.nz' > 'query.blastPsiAli.gz'
# lkajan: we have to switch off filtering (default for blastpgp) or sequences like ASDSADADASDASDASDSADASA fail with
# 'WARNING: query: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options'
# Does switching off filtering hurt us? Loctree uses the results of this for extracting keywords from swissprot, so I am not worried.
# This blast call also often writes 'Selenocysteine (U) at position 59 replaced by X' - we are not really interested. Silence this in non-debug mode.
trap "rm -f error.log" EXIT; \
if ! ( blastall -F F -a 1 -p blastp -d /data/src/rostlab-data/data/swissprot/uniprot_sprot -b 1000 -e 100 -m 8 -i query.fasta -o query.blastpSwissM8   ); then \
	EXIT=$?; cat error.log >&2; exit $EXIT; \
fi
/usr/share/librg-utils-perl//blastpgp_to_saf.pl fileInBlast=query.blastPsiOutTmp fileInQuery=query.fasta  fileOutRdb=query.blastPsi80Rdb fileOutSaf=query.safBlastPsi80 red=100 maxAli=3000 tile=0
opened query.fasta at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 126.
blastfile: query.blastPsiOutTmp at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 127.
nohits: 0 at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 128.
iter: 0 at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 129.
blast+: 0 at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 130.
Died at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 76.
*** ERROR blastpgp_to_saf.pl : *** ERROR blastp_to_saf: blast file format not recognized
/usr/share/predictprotein/MakefilePP.mk:465: recipe for target 'query.safBlastPsi80' failed
make: *** [query.safBlastPsi80] Error 255
rm query.blastPsi80Rdb query.blastPsiAli.nz
make: Leaving directory '/data/src/temp'
make --no-builtin-rules INFILE=query.in -C /data/src/temp JOBID=query -j 1 BLASTCORES=1 LIBRGUTILS=/usr/share/librg-utils-perl/ PPROOT=/usr/share/predictprotein/ PROFNUMRESMIN=17 PROFROOT=/usr/share/profphd/prof/ BIGBLASTDB=/data/src/rostlab-data/data/big/big BIG80BLASTDB=/data/src/rostlab-data/data/big/big_80 PFAM2DB=/data/src/rostlab-data/data/pfam_legacy/Pfam_ls PFAM3DB=/data/src/rostlab-data/data/pfam/Pfam-A.hmm PROSITEDAT=/data/src/rostlab-data/data/prosite/prosite.dat PROSITECONVDAT=/data/src/rostlab-data/data/prosite/prosite_convert.dat PSICEXE=/usr/share/rost-runpsic/runNewPSIC.pl SPKEYIDX=/data/src/rostlab-data/data/swissprot/keyindex_loctree.txt SWISSBLASTDB=/data/src/rostlab-data/data/swissprot/uniprot_sprot NORSPCTRL="--win=100" DEBUG=1 -f /usr/share/predictprotein/MakefilePP.mk all norsp failed: 512 at /usr/bin/predictprotein line 392.

Reply to: