Re: Outreachy 2017 (Round 14)
Hello Andreas,
1) I read the paper about mapdamage and figured out that with given pe.sam file the reference TGAAAACGA wouldn't make sense as the particular type of substitutions is specific for ancient DNA damages. So further for tests I used reference CGATACGA -- which contains all types of nucleotides and make sense regarding specifics of DNA damage.
2) But even after fixing (1) I still got an error (log attached). The problem is that on given dataset from pe_test mapdamage produces data with NaNs and then tries to compute quantiles on this data. I think that on real date NaNs also could be produced, so I treated this issue as a bug and made a patch which changes the way how quantiles are computed --- now it removes NaNs [1]
3) I wrote about all this to upstream as a comments to earlier created issue and asked if it's really was a bug in their opinion
4) FInally, move forward to fixing mapdamage test suite
Regards, Nadiya
Started with the command: /usr/local/bin/mapDamage -i pe.sam -r ref_T.fa
[fai_load] build FASTA index.
Reading from 'pe.sam'
Writing results to 'results_pe/'
pdf results_pe/Fragmisincorporation_plot.pdf generated
No length distributions are available, plotting length distribution only works for single-end reads
Performing Bayesian estimates
Warning, To few substitutions to assess the nick frequency, using constant nick frequency instead
Starting grid search, starting from random values
Adjusting the proposal variance iteration 1
Adjusting the proposal variance iteration 2
Adjusting the proposal variance iteration 3
Adjusting the proposal variance iteration 4
Adjusting the proposal variance iteration 5
Adjusting the proposal variance iteration 6
Adjusting the proposal variance iteration 7
Adjusting the proposal variance iteration 8
Adjusting the proposal variance iteration 9
Adjusting the proposal variance iteration 10
Done burning, starting the iterations
Done with the iterations, finishing up
Writing and plotting to files
Error in quantile.default(newX[, i], ...) :
missing values and NaN's not allowed if 'na.rm' is FALSE
Calls: source ... calcSampleStats -> data.frame -> apply -> FUN -> quantile.default
12: (function (e)
{
traceback(2)
quit(status = 1)
})()
11: stop("missing values and NaN's not allowed if 'na.rm' is FALSE")
10: quantile.default(newX[, i], ...)
9: FUN(newX[, i], ...)
8: apply(X, 1, quantile, c(0.025))
7: data.frame(x = 1:nrow(da), pos = da[, "Pos"], mea = apply(X,
1, mean), med = apply(X, 1, median), loCI = apply(X, 1, quantile,
c(0.025)), hiCI = apply(X, 1, quantile, c(0.975)))
6: calcSampleStats(da, CTs)
5: postPredCheck(dat, mcmcOut)
4: eval(expr, envir, enclos)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source(paste(path_to_mapDamage_stats, "main.R", sep = ""))
The Bayesian statistics program failed to finish
Traceback (most recent call last):
File "/usr/local/bin/mapDamage", line 382, in <module>
sys.exit(main())
File "/usr/local/bin/mapDamage", line 365, in main
mapdamage.rscript.run_stats(options)
File "/usr/local/lib/python2.7/dist-packages/mapdamage/rscript.py", line 144, in run_stats
raise e
subprocess.CalledProcessError: Command '['Rscript', '/usr/local/lib/python2.7/dist-packages/mapdamage/Rscripts/stats/runGeneral.R', '--args', '30', '10000', '10', '50000', '0', '0', '1', '1', '0', '0', '1', '12', 'results_pe/', '/usr/local/lib/python2.7/dist-packages/mapdamage/Rscripts/stats/', 'results_pe/Stats_out', '0', '0', '0', 'results_pe/acgt_ratio.csv', '0', '0']' returned non-zero exit status 1
Reply to: