Bug#770063: ITP: kmc -- count kmers in genomic sequences

To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Bug#770063: ITP: kmc -- count kmers in genomic sequences
From: Jorge Soares <j.s.soares@gmail.com>
Date: Tue, 18 Nov 2014 15:59:33 +0000
Message-id: <[🔎] 20141118155933.4020.96502.reportbug@debian>
Reply-to: Jorge Soares <j.s.soares@gmail.com>, 770063@bugs.debian.org

Package: wnpp
Severity: wishlist
Owner: Jorge Soares <j.s.soares@gmail.com>

* Package name : kmc
Version : 20
Upstream Author : Sebastian Deorowicz <sebastian.deorowicz@polsl.pl>, Marek Kokot, Szymon Grabowski, Agnieszka Debudaj-Grabysz
* URL : http://sun.aei.polsl.pl/kmc/index.html
* License : GPL3
Programming Lang: C, C++
Description : count kmers in genomic sequences

KMC—K-mer Counter is a utility designed for counting k-mers (sequences of consecutive k symbols) in a set of reads from genome sequencing projects. K-mer counting is important for many bioinformatics applications, e.g., developing de Bruijn graph assemblers. Building de Bruijn graphs is a commonly used approach for genome assembly with data from second-generation sequencer. Unfortunately, sequencing errors (frequent in practice) results in huge memory requirements for de Bruijn graphs, as well as long build time. One of the popular approaches to handle this problem is filtering the input reads in such a way that unique k-mers (very likely obtained as a result of an error) are discarded.

Thus, KMC scans the raw reads and produces a compact representation of all non-unique reads accompanied with number of their occurrences. The algorithm implemented in KMC makes use mostly of disk space rather than RAM, which allows to use KMC even on rather typical personal computers. When run at high-end server (what is necessary for KMC competitors) it outperforms them in both memory requirements and speed of computation. The disk space necessary for computation is in order of the size of input data (usually it is smaller).

kmc is a dependency of the iva - Iterative Virus Assembler - package

The software is used widely at the Wellcome Trust Sanger Institute

Sponsorship needed.

Reply to:

Prev by Date: Re: RFC: DEP-14: Recommended layout for Git packaging repositories
Next by Date: Re: making dput a wrapper around git
Previous by thread: Bug#770058: ITP: fonts-aksharyogini2 -- aksharyogini2 devanagari normal style font
Next by thread: Bug#770066: ITP: fonts-aksharyogini2 -- aksharyogini2 normal style font for devanagari
Index(es):
- Date
- Thread