The BLOSUM62 matrices have been buggy for 15 years...

To: debian-med@lists.debian.org
Subject: The BLOSUM62 matrices have been buggy for 15 years...
From: Charles Plessy <charles-debian-nospam@plessy.org>
Date: Mon, 10 Mar 2008 17:03:38 +0900
Message-id: <[🔎] 20080310080338.GB30014@kunpuu.plessy.org>

Hi all,

An interesting letter was send to Nature Biotech's editor:

http://dx.doi.org/10.1038/nbt0308-274

 The BLOSUM1 family of substitution matrices, and particularly BLOSUM62,
 is the de facto standard in protein database searches and sequence
 alignments. In the course of analyzing the evolution of the Blocks
 database2, we noticed errors in the software source code used to create
 the initial BLOSUM family of matrices (available online at
 ftp://ftp.ncbi.nih.gov/repository/blocks/unix/blosum/blosum.tar.Z). The
 result of these errors is that the BLOSUM matrices—BLOSUM62, BLOSUM50,
 etc.—are quite different from the matrices that should have been
 calculated using the algorithm described by Henikoff and Henikoff1.
 Obviously, minor errors in research, and particularly in software source
 code, are quite common. This case is noteworthy for three reasons:
 first, the BLOSUM matrices are ubiquitous in computational biology;
 second, these errors have gone unnoticed for 15 years; and third, the
 'incorrect' matrices perform better than the 'intended' matrices.

I can not quote the full letter because it is copyrighted, but if you do
not have access to this journal, I can communicate to you the PDF in
private.

Have a nice day,

-- 
Charles

Reply to:

Prev by Date: Re: [Soc-coordination] Deadline for the proposals?
Next by Date: Out-dated web content
Previous by thread: Re: [RFS] clustalw.
Next by thread: Out-dated web content
Index(es):
- Date
- Thread