BLOSUM
BLOSUM (BLOcks of Amino Acid SUbstitution Matrix[1]) is a substitution matrix used for sequence alignment of proteins.
BLOSUM are used to score alignments between evolutionarily divergent
protein sequences. Blosum is based on local alignments. Blosum was
first introduced in a paper by Henikoff and Henikoff.[2] They scanned the BLOCKS database
for very conserved regions of protein families (that do not have gaps
in the sequence alignment) and then counted the relative frequencies of
amino acids
and their substitution probabilities. Then, they calculated a log-odds
score for each of the 210 possible substitutions of the 20 standard
amino acids. All BLOSUM are based on observed alignments; they are not
extrapolated from comparisons of closely related proteins.
Several sets of BLOSUM exist using different alignment databases,
named with numbers. BLOSUM with high numbers are designed for comparing
closely related sequences, while BLOSUM with low numbers are designed
for comparing distant related sequences. For example, BLOSUM80 is used
for less divergent alignments, and BLOSUM45 is used for more divergent
alignments. Scores within a BLOSUM are log-odds scores that measure, in
an alignment, the logarithm for the ratio of the likelihood of two
amino acids appearing with a biological sense and the likelihood of the
same amino acids appearing by chance.[3] The matrices are based on the minimum percentage identity of the aligned protein sequence used in calculating them.[3]
Every possible identity or substitution is assigned a score based on
its observed frequences in the alignment of related proteins.[4] A positive score is given to the more likely substitutions while a negative score is given to the less likely substitutions.
BLOSUM62 is the matrix calculated by using the observed
substitutions between proteins which have at least 62% sequence
identity, and has become a standard for alignment software.
BLOSUM has proved better at scoring distantly related sequences than the once-widely-used Point Accepted Mutation (PAM) matrices. To calculate a matrix for BLOSUM, the following equation is used: 
Here, pij is the probability of two amino acids i and j replacing each other in a homologous sequence, and qi and qj are the background probabilities of finding the amino acids i and j in any protein sequence at random. The factor λ is an important scaling factor [5], set to make sure that the matrix contains easily readable integer values.
References
This article is licensed under the GNU Free Documentation License. It uses material from Wikipedia Encyclopedia article "BLOSUM"
|
|