Consensus Sequence
In molecular biology and bioinformatics, a consensus sequence is a way of representing the results of a multiple sequence alignment, where related sequences are compared to each other, and similar functional sequence motifs are found. The consensus sequence shows which residues are conserved (are always the same), and which residues are variable.
Developing software for pattern recognition is a major topic in genetics, molecular biology, and bioinformatics. Specific sequence motifs can function as regulatory sequences controlling biosynthesis, or as signal sequences
that direct a molecule to a specific site within the cell or regulate
its maturation. Since the regulatory function of these sequences is
important, they are thought to be conserved across long periods of evolution. In some cases, evolutionary relatedness can be estimated by the amount of conservation of these sites.
The conserved sequence motifs are called consensus sequences and they show which residues are conserved and which residues are variable. Consider the following example DNA sequence:
- A[CT]N{A}
In this notation, A means that always an A is found in that
position. [CT] stands for either C or T, N stands for any base, and {A}
means any base except A. Y represents any pyrimidine, and R indicates any purine.
In this example, the notation [CT] does not give any indication of
the relative frequency of C or T occurring at that position. An
alternative method of representing a consensus sequence uses a sequence logo.
This is a graphical representation of the consensus sequence, in which
the size of a symbol is related to the frequency that a given
nucleotide (or amino acid) occurs at a certain position. In sequence
logos the more conserved the residue, the larger the symbol for that
residue is drawn, the less frequent, the smaller the symbol. Sequence
logos can be generated using the Gestalt Workbench, a publicly available visualization tool written by Gustavo Glusman at the Institute for Systems Biology.
A consensus sequence may be a short sequence of nucleotides which is found several times in the genome and is thought to play the same role in its different locations. For example, many transcription factors recognise particular consensus sequences in the promoters of the genes they regulate. In the same way restriction enzymes usually have palindromic consensus sequences, usually corresponding to the site where they cut the DNA. Transposons act in much the same manner in their identification of target sequences for transposition. Finally splice sites (sequences immediately surrounding the exon-intron boundaries) can also be considered as consensus sequences.
Thus a consensus sequence defines a putative DNA recognition site:
it is obtained by aligning all known examples of a certain recognition
site and defined as the idealized sequence that represents the
predominant base at each position. All the actual examples shouldn't
differ from the consensus by more than a few substitutions.
Any mutation allowing a mutated nucleotide in the core promoter
sequence to look more like the consensus sequence is known as an up mutation.
This kind of mutation will generally make the promoter stronger and
thus the RNA polymerase forms a tighter bind to the DNA it wishes to
transcribe and transcription is up regulated. On the contrary,
mutations that destroy conserved nucleotides in the consensus sequence
are known as down mutations. These types of mutations down
regulate transcription since RNA polymerase can no longer bind as
tightly to the core promoter sequence.
This article is licensed under the GNU Free Documentation License. It uses material from Wikipedia Encyclopedia article "Consensus Sequence"
|