[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#778589: ITP: express -- Streaming quantification for high-throughput sequencing



Package: wnpp
Severity: wishlist
Owner: Debian Med team <debian-med@lists.debian.org>
X-Debbugs-Cc: debian-devel@lists.debian.org, debian-med@lists.debian.org

* Package name    : express
  Version         : 1.5.1
  Upstream Author : Adam Roberts & Lior Pachter <ask.xprs@gmail.com>
* URL             : http://bio.math.berkeley.edu/eXpress/index.html
* License         : Artistic-2.0
  Programming Lang: C++
  Description     : Streaming quantification for high-throughput sequencing

eXpress is a streaming tool for quantifying the abundances of a set of
 target sequences from sampled subsequences. Example applications include
 transcript-level RNA-Seq quantification, allele-specific/haplotype
 _expression_ analysis (from RNA-Seq), transcription factor binding
 quantification in ChIP-Seq, and analysis of metagenomic data. It is
 based on an online-EM algorithm that results in space (memory)
 requirements proportional to the total size of the target sequences and
 time requirements that are proportional to the number of sampled
 fragments. Thus, in applications such as RNA-Seq, eXpress can accurately
 quantify much larger samples than other currently available tools
 greatly reducing computing infrastructure requirements. eXpress can be
 used to build lightweight high-throughput sequencing processing
 pipelines when coupled with a streaming aligner (such as Bowtie), as
 output can be piped directly into eXpress, effectively eliminating the
 need to store read alignments in memory or on disk.
 .
 In an analysis of
 the performance of eXpress for RNA-Seq data, we have observed that this
 efficiency does not come at a cost of accuracy. eXpress is more accurate
 than other available tools, even when limited to smaller datasets that
 do not require such efficiency. Moreover, like the Cufflinks program,
 eXpress can be used to estimate transcript abundances in multi-isoform
 genes. eXpress is also able to resolve multi-mappings of reads across
 gene families, and does not require a reference genome so that it can be
 used in conjunction with de novo assemblers such as Trinity, Oases, or
 Trans-ABySS. The underlying model is based on previously described
 probabilistic models developed for RNA-Seq but is applicable to other
 settings where target sequences are sampled, and includes parameters for
 fragment length distributions, errors in reads, and sequence-specific
 fragment bias.
 .
 eXpress can be used to resolve ambiguous mappings in other
 high-throughput sequencing based applications. The only required inputs
 to eXpress are a set of target sequences and a set of sequenced
 fragments multiply-aligned to them.  While these target sequences will
 often be gene isoforms, they need not be. Haplotypes can be used as the
 reference for allele-specific _expression_ analysis, binding regions for
 ChIP-Seq, or target genomes in metagenomics experiments. eXpress is
 useful in any analysis where reads multi-map to sequences that differ in
 abundance.

Express is a dependency of trinityrnaseq. The Debian Med team will be group
maintaining it.

Reply to: