[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#970192: ITP: clark -- accurate and versatile classification of biological sequences



Package: wnpp
Severity: wishlist

Subject: ITP: clark -- accurate and versatile classification of biological sequences
Package: wnpp
Owner: Steffen Moeller <moeller@debian.org>
Severity: wishlist

* Package name    : clark
  Version         : 1.2.6.1
  Upstream Author : Rachid Ounit <rouni001@cs.ucr.edu>
* URL             : http://clark.cs.ucr.edu/
* License         : GPL-3.0+
  Programming Lang: (C, C++, C#, Perl, Python, etc.)
  Description     : accurate and versatile classification of biological sequences
 The problem of DNA sequence classification is central to several
 application domains in molecular biology, genomics, metagenomics and
 genetics. Although several software tools have been developed for this
 problem, it is still computationally challenging due to the size of
 datasets generated by modern sequencing instruments and the growing
 size of reference sequence databases.
 .
 CLARK is based on a supervised sequence classification using
 discriminative k-mers. Considering two distinct specific classification
 problems (see the article for details), namely (1) the taxonomic
 classification of metagenomic reads to known bacterial genomes,
 and (2) the assignment of BAC clones and transcript to chromosome
 arms/centromeres (in the absence of a finished assembly for the reference
 genome), CLARK aspires to outperforms in classification speed and
 precision the best state-of-the-art methods.
 .
 Three classifiers from the CLARK framework are provided:
 .
  * CLARK (default): created for powerful workstation, it can require
    a significant amount of RAM to run with large database (e.g., all
    bacterial genomes from NCBI/RefSeq). This classifier is the standard
    in the CLARK tool series. It builds discriminative k-mers from all
    k-mers in the targets, queries k-mers with exact matching, and, in
    its fastest mode, classifies 1 million short reads in few seconds...;
  * CLARK-l : created for workstations with limited memory (i.e., "l"
    for light), this software tool provides precise classification on
    small metagenomes. Indeed, for metagenomics analysis, CLARK-l works
    with a sparse or ''light'' database (up to 4 GB of RAM) while still
    performing ultra accurate and fast results. This classifier builds
    discriminative k-mers from non-overlapping and distant k-mers in the
    targets and queries k-mers with exact matching;
  * CLARK-S: created for powerful workstations and exploiting spaced
    k-mers (i.e., "S" for spaced), this classifier requires a higher RAM
    usage than CLARK or CLARK-l, but it does offer a higher sensitivity
    than CLARK at the species level (see the peer-reviewed publication in
    Bioinformatics). CLARK-S completes the series of classifiers from the
    CLARK framework.
 .
 Other applications of CLARK are, for example, the detection of
 contaminants, the identification of chimerism and vector contamination
 in sequenced BACs (cf. "Overview" tab).

Remark: This package is maintained by Steffen Moeller at
   https://salsa.debian.org/med-team/clark


Reply to: