The BLOSUM62 matrices have been buggy for 15 years...
Hi all,
An interesting letter was send to Nature Biotech's editor:
http://dx.doi.org/10.1038/nbt0308-274
The BLOSUM1 family of substitution matrices, and particularly BLOSUM62,
is the de facto standard in protein database searches and sequence
alignments. In the course of analyzing the evolution of the Blocks
database2, we noticed errors in the software source code used to create
the initial BLOSUM family of matrices (available online at
ftp://ftp.ncbi.nih.gov/repository/blocks/unix/blosum/blosum.tar.Z). The
result of these errors is that the BLOSUM matrices—BLOSUM62, BLOSUM50,
etc.—are quite different from the matrices that should have been
calculated using the algorithm described by Henikoff and Henikoff1.
Obviously, minor errors in research, and particularly in software source
code, are quite common. This case is noteworthy for three reasons:
first, the BLOSUM matrices are ubiquitous in computational biology;
second, these errors have gone unnoticed for 15 years; and third, the
'incorrect' matrices perform better than the 'intended' matrices.
I can not quote the full letter because it is copyrighted, but if you do
not have access to this journal, I can communicate to you the PDF in
private.
Have a nice day,
--
Charles
Reply to: