[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#970447: ITP: pinfish -- Collection of tools to annotate genomes using long read transcriptomics data



Package: wnpp
Severity: wishlist
Owner: Nilesh Patra <npatra974@gmail.com>
X-Debbugs-CC: debian-devel@lists.debian.org

* Package name    : pinfish
  Version         : 0.1.0+ds-1
  Upstream Author : Oxford Nanopore Technologies Ltd.
* URL             : https://github.com/nanoporetech/pinfish
* License         : MPL-2.0
  Programming Lang: Go
  Description     :  Collection of tools to annotate genomes using long read transcriptomics data
 The toolchain is composed of the following tools:
 1. spliced_bam2gff - a tool for converting sorted BAM
 files containing spliced alignments
 into GFF2 format. Each read will be represented as a distinct
 transcript. This tool comes handy when visualizing spliced
 reads at particular loci and to provide input to the rest
 of the toolchain.
 .
 2. cluster_gff - this tool takes a sorted GFF2 file as
 input and clusters together reads having similar
 exon/intron structure and creates a rough consensus
 of the clusters by taking the median of exon
 boundaries from all transcripts in the cluster.
 .
 3. polish_clusters - this tool takes the cluster
 definitions generated by cluster_gff and for each
 cluster creates an error corrected read by mapping
 all reads on the read with the median length
 and polishing it using racon. The polished reads
 can be mapped to the genome using minimap2 or GMAP.
 .
 4. collapse_partials - this tool takes GFFs generated
 by either cluster_gff or polish_clusters and filters
 out transcripts which are likely to be based on RNA
 degradation products from the 5' end. The tool clusters
 the input transcripts into "loci" by the 3' ends and
 discards transcripts which have a compatible transcripts
 in the loci with more exons.

I shall maintain this package.

Reply to: