[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

The BLOSUM62 matrices have been buggy for 15 years...



Hi all,

An interesting letter was send to Nature Biotech's editor:

http://dx.doi.org/10.1038/nbt0308-274

 The BLOSUM1 family of substitution matrices, and particularly BLOSUM62,
 is the de facto standard in protein database searches and sequence
 alignments. In the course of analyzing the evolution of the Blocks
 database2, we noticed errors in the software source code used to create
 the initial BLOSUM family of matrices (available online at
 ftp://ftp.ncbi.nih.gov/repository/blocks/unix/blosum/blosum.tar.Z). The
 result of these errors is that the BLOSUM matrices—BLOSUM62, BLOSUM50,
 etc.—are quite different from the matrices that should have been
 calculated using the algorithm described by Henikoff and Henikoff1.
 Obviously, minor errors in research, and particularly in software source
 code, are quite common. This case is noteworthy for three reasons:
 first, the BLOSUM matrices are ubiquitous in computational biology;
 second, these errors have gone unnoticed for 15 years; and third, the
 'incorrect' matrices perform better than the 'intended' matrices.

I can not quote the full letter because it is copyrighted, but if you do
not have access to this journal, I can communicate to you the PDF in
private.

Have a nice day,

-- 
Charles


Reply to: